Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 6 min | 1128 words

imageFor some reason, the moment that peole start working on Enterprise Solutions, there is a... tendency to assume that because we are going to build a big and complex application, we can afford to ignore best practices and proven practices.

That is just wrong. And it applies in many manners to many parts of the application, but it never applies to something as obiviously as it applies to the database.

This is just the results of a very short research:

  • The metadata driven approach, AKA the 5 tables DB
  • The Blob
  • The  5th normalized form DB
  • The 5th denormalized form DB
  • The table a day DB
  • The obsfuscated database - create table tbl15 (fld_a int, fld_b int, fld_g nvarchar(255) )

And those are just of the database structure, we haven't yet arrived to the database behavior yet. Here we have interesting approachs such as rolling your own "Transactions" table, requiring 40 joins to get a simple value, using nvarchar as the ultimate extensibility, etc.

So, how do we approach building a database that we can actually show in public? Let us start with thinking about the constraints that we have for the application. The database needs to support...

  • OLTP for the application itself.
  • Reports.
  • Performance.
  • ETL processes to take data in and out of the sytem.
  • Large amount of data.
  • Large amount of entities.

Notice how we don't have any suggestion about integrating with the application at the database level. That is a fairly common antipattern.

We are going to try to keep a one to one mapping between the entity and the table structure, because that will make it significantly easier to work with the system. One to one means that the Account Name translate to an Account.Name in the entity model, and Accounts.Name in the database model.

 

image

image We probably want the usual auditing suspects, such as Id, CreatedAt, ModifiedAt, CreatedBy, ModifiedBy, Owner, OrganizationUnit. Maybe a few more, if this makes sense. Probably not too much, because that can get to be a problem.

Accessing and manipulating this in the application layer it going to be trivial, so I am not really going to cover that in depth.

What I do want to talk about is how we expose it to the outside world? And by that I mean reports and ETL processes.

We have several options to try to do that. We can just let the reports and ETL processes to read from the database tables directly. This is the simplest approach, I think.

Other options include views, stored procedures, and other DB thingies that we can use. I have seen systems where an entity was composed of several tables, and the report was done off of a view that joined it all together.

The underlying problem is that I have versioning to consider. I am going to dedicate a full post to the problem of building upgradable user customizable applications, so I'll delay it to then, but let us just say that radically changing the database schema between version will be painful for the users. The usual way to handle that is to only make promises for the views, and not for the tables themselves. That is a good way to handle that, in my opinion.

I would suggest putting those in a separate schema, to make it clearer that the seperation is imortant. This also gives you the ability to later do a view to table refactoring, and maintain that with triggers. This is usually a bad idea, but for performance reasons, it may very well be a good solution.

ETL processes can use the same mechanism that reports use to read data from the database efficently, but nothing writes to the database except the application. At one point I wrote a change DB password util that run every two hours, it would change the database password and update the relevant config files.

I think that you can guess what the driving force to that where, no?

Getting data into the database can be done through application services (not neccesarily web services, btw). A simple example would be API similar to this one:

void Insert(params Account[] accounts);
void Update(params Account[] accounts);
void Delete(params Account[] accounts);

This API explicitly allows for bulk operations, so it can be very nice to work with, instead of having to do things on per-row basis, which basically kills performance.

How to get good performance from this system is another hard question. In this case, I would usually recommend on getting a good DBA to look at the perfrormance charactaristics of the application and optimize the database structure if needed. But, a much easier solution to performance problems in the database server is to not hit the DB server, but use caching. Distributed caching solutions, like Memcached, NCache, etc are really good way of handling that.

No business logic in the DB! This is important, if you put business logic in the DB, you have to get to the DB in order to execute the business logic. This kills scalablity, hurts the ability to understand the solution, and in general makes life miserable all around.

Reports are an interesting problem. How do you deal with security, for instance? Consider the security infrastructure that I already presented. This security infrastructure should also come with a database function that you can use like this:

SELECT * FROM Accounts 
WHERE IsAllowed(Accounts.EntitySecurityKey, @UserId, 'Account.View')

Or this:

SELECT * FROM Accounts
WHERE Accounts.EntitySecurityKey IN (
	SELECT EntitySecurityId FROM GetAllowed(@UserId, 'Account.View') 
)

Both of which provides really easy manner to get security for the reports. If we wanted to enforce that, we can force the report writer to write somtehing like this:

SELECT * FROM GetAllowedAccounts(@UserId, 'Account.View')

We can probably get away with assuming that 'Account.View' is the default operation, so it is even shorter. Among other things, this actually have valid performance characteristics.

This post it turning out to be a "don't make stupid mistakes" post, because I don't think that I am writing anything new here. About how to avoid making stupid mistake, that is fairly simple as well. Get a good DBA (that is left as an excersize for the reader), give him/her a big stick and encourage good design through superior firepower.

time to read 5 min | 901 words

So I was in Jermey Miller's talk about maintainable software echo system, and one of the thing that he mentioned that StructureMap does really well is the ability to ask the container to perform envrionment validations, to make sure that the envrionment is ready for us.

I really liked the idea, so I pulled up the laptop and started spiking how to handle this issue. First, let us see Jeremy's solution:

public class FileTextReader : ITextReader
{
	[ValidateConfiguration]
	public void ValidateFileExistance()
	{
		if (File.Exists(fileName) == false)
			throw new FileNotFoundException("Could not find file " + fileName);
	}
}

So, when you ask structure map to validate the environment, it will run all the methods that have been decorated with [ValidateConfiguration].

So, how do that that in Windsor?

The most important thing to realize with Windsor is that it is a container that was built to be extensible. Something like that is not going to be a change to the container, it will be an extension, not a change to the container itself. Extensions are usually facilities, like this one:

public class ValidationFacility : AbstractFacility
{
	private readonly List<string> componentsToValidate = new List<string>();

	protected override void Init()
	{
		Kernel.AddComponent<ValidateConfiguration>();
		IHandler handler = Kernel.GetHandler(typeof(ValidateConfiguration));
		handler.AddCustomDependencyValue("componentsToValidate",
			 componentsToValidate
			);
		Kernel.ComponentRegistered += OnComponentRegistered;
	}

	public void OnComponentRegistered(string key, IHandler handler)
	{
		foreach (MethodInfo method in 
handler.ComponentModel.Implementation.GetMethods()) { bool isValidateMethod = method
.GetCustomAttributes(typeof(ValidateConfigurationAttribute), true)
.Length != 0; if (isValidateMethod) { componentsToValidate.Add(key); break; } } } }

This extends the container, and whenever a component is registered, I am checking if I need to add that to the list of components that needs validation. I am doing a  tiny bit of cheating here and passing the componentsToValidate as a reference to the component, it is simpler that way, but the component gets the same instance, which is probably not what I would like to do with it for other approaches. I would usually got with a sub resolver that matched that issue, if I was doing something like this for more interesting purposes.

Anyway, here how the ValidationConfiguration class is built:

public class ValidateConfiguration
{
	private readonly ICollection<string> componentsToValidate;
	private readonly ILogger logger;
	private readonly IKernel kernel;

	public ValidateConfiguration(
		ICollection<string> componentsToValidate,
		ILogger logger,
		IKernel kernel)
	{
		this.componentsToValidate = componentsToValidate;
		this.logger = logger;
		this.kernel = kernel;
	}

	public void PerformValidation()
	{
		foreach (string key in componentsToValidate)
		{
			ValidateComponent(key);
		}
	}

	private void ValidateComponent(string key)
	{
		IHandler handler = kernel.GetHandler(key);
		if (handler == null)
		{
			logger.Warn("Component {0} was removed before it could be validated", key);
			return;
		}
		try
		{
			object component = handler.Resolve(CreationContext.Empty);
			foreach (MethodInfo method in component.GetType().GetMethods())
			{
				bool isValidateMethod = method.GetCustomAttributes(typeof(ValidateConfiguration), true).Length == 0;
				if (isValidateMethod)
				{
					ExecuteValidationMethod(component, method);
				}
			}
		}
		catch (TargetInvocationException e)
		{
			logger.Error("Failed to run validation for {0}, because: {1}", key, e.InnerException);
		}
		catch (Exception e)
		{
			logger.Error("Failed to run validation for {0}, because: {1}", key, e);
		}
	}

	private void ExecuteValidationMethod(object component, MethodBase method)
	{
		try
		{
			method.Invoke(component, new object[0]);
		}
		catch (Exception e)
		{
			logger.Error("Failed to validate {0}.{1}. Error: {2}",
				method.DeclaringType,
				method.Name,
				e);
		}
	}
}

This is a class that has some deep association with the container. It is usually not something that I would like in my application services, but it is fine for instrastracture pieces, like this one.

Now that I have that , I can actually test the implementation:

IWindsorContainer container = new WindsorContainer();
container.AddFacility("validation", new ValidationFacility());
container.AddComponent<ITextReader, FileTextReader>();
container.Kernel.GetHandler(typeof(ITextReader))
	.AddCustomDependencyValue("fileName", "foo");
container.AddComponent<ILogger, ConsoleLogger>();

ValidateConfiguration resolve = container.Resolve<ValidateConfiguration>();
resolve.PerformValidation();

And this will go over everything and perform whatever validations needs to be done.

As I said, I really like the idea, and extending this to a build task is really trivial (especially if you are using Boo Build System to do things).

The main point, however, is that I managed to write this piece of code (around 100 lines or so), during Jeremy's talk, so from the time he talked about that feature to the time that he finished, I already got that done. This actually has nothing to do with my personal prowess with code, but it has a lot to do with the way Windsor it built, as a set of services that can be so readily extended.

After I have gotten used to the style that Windsor has, it is getting really addictively easy to start extending the container in interesting ways. I highly recommend that you will take a look at those features, they are interesting both from "what I can do with them" and from "what design allowed this".

time to read 2 min | 356 words

I got some feedback about my previous review, that the PetShop 2.0 was recognized as architecturely unsound, and that I should look at version 3.0 of the code, which is:

Version 3.x of the .NET Pet Shop addresses the valuable feedback given by reviewers of the .NET Pet Shop 2.0 and was created to ensure that the application is aligned with the architecture guideline documents being produced by the Microsoft.

I have to say, it looks like someone told the developers, we need an architecture, go build one. The result is... strange. It make my spider sense tingle. I can't really say that it is wrong, but it makes me uncomfortable.

Take a look at the account class, and how it is structured:

image

Well, I don't know about you, but that is poor naming convention to start with. And I am seeing here an architecture by rote, if this makes any sort of sense.

Then there are such things as:

image

Which leads us to this:

image

The MSG_FAILURE is:

image

I am sorry, but while there was some effort made here over the previous version, I am not really impressed with it. As I said, the architecture is now probably sound, if suspicious because of lack of character, but the implementation is still not really one that I would call decent. I have to admit about a strong bias here, though. I don't like te naked CLR, but the code has missed a lot of opportunities to avoid unnecessary duplication and work.

I have been also asked what I would consider a good sample application, and I would like to recommend Cuyahoga, as the application that probably models my thinking the best. SubText is also good, but it is more interesting case, because I don't like its reliance on stored procedures. Nevertheless, it is a valid approach, and it certainly serving this blog very well.

time to read 2 min | 316 words

I gave a talk about ReSharper today, and I used the PetShop demo app as the base code. I have purposefully avoided looking at the source code of the sample until today, because I wanted to get a gueniue experience, rather than a rehearsed one. I don't think it went as well as it could have, but that is not the point of this post. The point is to talk about just the code quality of the PetShop application.

First, let us see what the PetShop application is:

The Microsoft .NET Pet Shop 2.0 illustrates basic and advanced coding concepts within an overall architectural framework

The Pet Shop example is supposed to illustrate "coding concepts", but mostly it demonstrate those that you want to avoid. I was shocked by what I was seeing there.

I am going to touch just the account class, because if has enough issues all on its own. The account class is supposed to be a domain entity, but what do you see when you open it?

image

And I really don't like to see SQL inside my code.

And then there is this:

image

And suddenly I am confused, because I see a class that is doing the work of three classes, and that is just by casual browsing.

And then we had this:

image

Somehow, public fields and violating the .NET naming conventions doesn't really strikes me as a good idea.

Code duplication, like between the Account.Insert() and Account.Update(), both have significant duplication in the form of:

image 

I thought that the whole idea of a sample app was to show of best practices, and that is... quite not so, in this case.

time to read 2 min | 285 words

I am having quite a few interesting discussions at DevTeach, and one of those had to do with introducing projections and processes against opposition. For myself, I am a... bit forceful about such suggestions, especially in face of stupid opposition.

One of the things that came up was simply to do it, the old "it is easier to ask for forgiveness than permission". I am both supportive for that and not really comfortable with the idea.

I support it because it is a way to actually get things done, but I got a really good example of why it is not always a smart idea. The story was using Rhino Mocks for mocking, with some team members starting to use it without proper introduction.

The resulting code created tests that passed, but had a strong coupling to the code under test (too many mocks, too much expectation). When the code change, the test broke, because it was specifying too much.

For myself, I have seen similar issues that can result as slipping stuff under the radar, which is why I am not comfortable with that in most cases.

It is not always the case, continuous integration is one such case in which there isn't usually a problem in just setting it up. But if you are adding a dependency to the system, you need to make it clear to the team how it works. Doing otherwise introduce the bus factor, damage the ability of the team, and a host of other problems.

By the way, this doesn't mean that all your team members have to have a vote in any dependency, or any pattern, but it does mean that they all should be aware of them.

time to read 1 min | 191 words

Ivan also posted a response to Jeremy's C# vNext features, and he said something that caught my eye:

5. Metaprogramming => no objection here although you can do it now with reflection emit

No, there is a big difference between reflection emit and byte code weaving. Reflection emit is done after compilation is completed, meta programming occurs during compilation. This matters because it means that your code cannot refer to the results of the change being made.

A case in point, I am using IL Weaving to add the Dispose method to a class, but because I am doing it after compilation is completed, I cannot call the Dispose() method on the class, or put it in a using statement, in the same assembly, I would get a compiler error, because the method (and the interface) are not there yet.

Using meta programming, the compiler will know that those now exist, and it can not start using it, because it happened during the compilation process.

The implication of that are pretty significant, if you are talking about what you can and can't do in terms of enriching the language.

Indistiguisable

time to read 1 min | 67 words

Nikhil has a post about using MS Ajax with MS MVC*.

What was particulary interesting to me was that it reminded me very strongly of posts that I wrote, exploring Ajax in MonoRail. The method used was the same, the only changes were exteremely minute details, such as different method names with the same intention, etc.

* Can we please get better names for those?

time to read 1 min | 157 words

Bill Wagner has a proposal about the usage of mixins. He is talking about having a marker interface with minimal methods (or no methods), and then extending that using extension methods. To a point, he is correct, this will give you some sense of what a mixin is. But it is not enough.  

It is not enough because of the following reasons:

  • It is not really a cohesive solution. There is no really good way to specify something like SnapshotMixin. You need interface and static class and inherit from the marker interface, etc. Those are... problematic. I want to just be able to say: "also give me the functionality of this class"
  • A more important issue is one of state. The examples in Bill's proposal are all stateless methods, but I want to have a stateful mixin. I can think of several hacks around that, but they are hacks, not a proper way to work.

C# vNext

time to read 3 min | 435 words

Jeremy Miller asks what we want in C# vNext. I have only real one request, to have meta programming of sufficent power, after which will be able to add all the required semantics without having the compiler team to argue with.

I am not holding my breath on that one, though. I can just imagine the arguments against it (let us start from the potentail for abuse, move to version and backward compatability hell, and then move forward).

I  want to go over Jeremy's list, and see what I can add there.

  1. Mixin's - Agree 102%. This is something that would so useful, I can't realy understand how it is not already there. Make it a magic attribute, something like [Mixing(typeof(IFoo), typeof(FooImpl))], and you can get away with it with just compiler magic, no changes required to the CLR.
  2. Symbols - I am ambivelent on that one. Syntatic sugar is important, but I have other things that I would value more.
  3. Make hashes a language feature - I think that you can do it right now with this syntax:
  4. var hash = new Hash(
    	Color => "red",
    	Width => 15
    );
  5. Automatic delegation ala Ruby or Objective C - Um, isn't this just mixin?
  6. Metaprogramming! - absolutely. This is something that I have gotten to consider as basic. I am getting tired of having to fight the compiler to get the code that I want to have. The code should express my meaning, I shouldn't have to dance to the compiler's tune.
  7. Everything is virtual by default to make mocking easier - I certainly would like that, but I fear that this is not something that will be changed. AOP as a platorm concept, now that is something that I want to see.

My own request covers:

  1. memberinfo() - the CLR already has this concept, so we just need the syntax for it.
  2. Method Interception - let us start with the easy stuff, I want to be able to intercept methods from any type. I can do it now if I want to mess with the profiler API, but that is not something that I can really make use of for production.
  3. IDynamicObject - I want method missing, damn it! It is just the scratch of meta programming, but this is something that you could probably add to the compiler in a week.
  4. Static interfaces. Since we already has generics to allow us to treat types as interchangable types, I want to extend this concept by just a bit, to get it to work in a more reasonable manner.

I have a few more, but they just called my flight.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}