Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,567
|
Comments: 51,185
Privacy Policy · Terms
filter by tags archive
time to read 8 min | 1449 words

In log4net 1.2.9 there is a bug in the AdoNetAppender. It will not log null values appropriately. This bug is fixed in log4net 1.2.10, but it is not always possible to move to the next version. (In my case, both NHibernate and Castle uses log4net 1.2.9)

I tried recompiling everything to use 1.2.10, but it took too long, and eventually I simply wrote this little trigger:

CREATE TRIGGER [ReplaceNullLookAlikeWithNulls]

   ON  [dbo].[Logs]

   INSTEAD OF INSERT

AS

BEGIN

      -- SET NOCOUNT ON added to prevent extra result sets from

      -- interfering with SELECT statements.

      SET NOCOUNT ON;

 

   INSERT INTO .[dbo].[Logs]

           ([Date]

           ,[Thread]

           ,[Level]

           ,[Logger]

           ,[Message]

           ,[Exception]

           ,[Filename]

      SELECT

              [Date]

              ,[Thread]

              ,[Level]

              ,[Logger]

              ,[Message]

              ,[Exception]

              ,case [Filename] when '(null)' then NULL else [Filename] end

       FROM Inserted

END

Not the best solution, but it will hold water.

time to read 12 min | 2212 words

I need to be able to ask the database for some rows to work on, and I need to be sure that I am the only one that works on those rows. After thinking about it, I came up with this SQL:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

BEGIN TRANSACTION

 

      INSERT INTO Locks(LockedBy, LockedAt, LcokedItemId, LockedItemType)

      OUTPUT inserted.Id

      SELECT TOP 100 @lockedBy, getdate(), item.Id, @itemType

      FROM Items item

      WHERE NOT EXISTS (SELECT 1 FROM Locks lock

            WHERE lock.LcokedItemId = item.id AND lock.LockedItemType = @itemType)

      ORDER BY item.Id

 

COMMIT TRANSACTION

This is the first time I need to implement pessimistic locking, and I am not sure if this is the right way to go. The work I am doing is disconnected, so I can't just lock the rows and continue to work.

Any suggestions?

Update: This is what I have now:

CREATE PROCEDURE lockAndGetItems AS

 

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

BEGIN TRANSACTION

 

      DECLARE @lockedBy uniqueIdentifier, @countOfItemsToLock int

      SET @lockedBy  = '68A6D54C-557D-428D-8B82-5D68C9C1B33E'

 

      -- Get number of rows already locked by me

      SELECT @countOfItemsToLock = 100 - COUNT(Id) From Locks where LockedBy = @lockedBy

 

      -- Lock rows to complete to an even 100

      INSERT INTO Locks(LockedBy, LockedAt, LockedItemId)

      SELECT TOP (@countOfItemsToLock)  @lockedBy, getdate(), item.Id

      FROM Items item

      WHERE NOT EXISTS (SELECT 1 FROM Locks lock

            WHERE lock.LockedItemId = item.id and lock.LockedBy = @lockedBy)

      ORDER BY item.Id

 

      -- Return row data

      SELECT Items.* FROM Items JOIN Locks ON Items.Id = Locks.LockedItemId AND LockedBy = @lockedBy

     

COMMIT TRANSACTION

Basically it fetch the first 100 items designated for this SP. Each SP has a single caller, so I don't worry about expiring the lock. I will always get the locked rows. In the case of a crash when the rows are locked, when the client is restarted, it will get the same rows that it already locked, and can start working on them. The reason that I implement is as a SP with hard coded Guid is that each SP locks a different table.

time to read 4 min | 614 words

Moran has pointed me to the CROSS APPLY syntax in T-SQL 2005, which does allows joining against a table valued function. Using this, the query goes down to this:

SELECT

        EmpName,

        CONVERT(NVARCHAR, CurrentDate,103) Date,

        HasWorked = CASE SUBSTRING(Roster,

                dbo.IndexInRoster(StartDate,CurrentDate,LEN(Roster)), 1)

                        WHEN '_' THEN 0

                        WHEN '*' THEN 1

                        ELSE NULL

            END

FROM Schedules CROSS APPLY DateRange(StartDate,EndDate)

This is much nicer way to deal with it. Considerring that I am using similar techniques all over the place, this is a really good thing to know.

time to read 55 min | 10901 words

I'm starting to get quite a bit of mail from this blog. Some of those questions are about subjects I can answer immediately, some require a fair amount of work (which can be had, if you really want), and the more interesting ones are those that require some thinking, but does not require too much time. This question from Dave is the best one so far, and I got his permission to blog about it, so I'm doubly happy.

The issue is working against a legacy database to get the data for further processing. I'll let Dave explain the issue, since he does it much better:

I have to write a query to generate a report over some interesting data.  It's basically scheduling which days people are working.  The data looks like this:

Id EmpName StartDate EndDate Roster
1 Bob 12/06/2006 18/06/2006 _*___**
2 Mary 12/06/2006 18/06/2006 *_*__*_

The trick is, the roster field contains a string with a _ or * depending on wether the person is scheduled to work that day or not, but the first character always starts on the sunday.  The startdate and enddate can be any day of the week.
In the example above, the 12-jun is a monday, so monday corresponds to the second character in the roster string, so Bob's working and Mary's not.
The roster string wraps around, so the first character of the roster string actually corresponds with the enddate here!  Now, this roster string could be 7, 10, 14 days long. 

I could get the report out if I can write a query to get it to this:

 

Employee DateWorking
Bob 12/06/2006
Bob 16/06/2006
Mary 13/06/2006
Mary 16/06/2006

 

By the way, I haven't asked, but I'll bet that this schema has originated from a MainFrame, if not currently, than in its recent past.

First I created the schema I needed:

 

CREATE TABLE Schedules

(

        Id INT IDENTITY(1,1) NOT NULL PRIMARY KEY,

        EmpName NVARCHAR(255) NOT NULL,

        StartDate DATETIME NOT NULL,

        EndDate DATETIME NOT NULL,

        Roster NVARCHAR(50) NOT NULL

);

GO

 

INSERT INTO Schedules

SELECT 'Bob','12-Jun-06','18-Jun-06','_*___**'

UNION ALL

SELECT 'Mary','12-Jun-06','18-Jun-06','*_*__*_'

 

GO

 

Then, I started playing with DatePart(), getting the day of the week of StartDate from each row. This gave me the index I needed into the Roster column. But, this only told me whatever the employee worked or didn't work on the start date, which isn't very helpful. What I needed was a way to check for all the values between StartDate and EndDate.

I posted about this issue a while ago, and I made use of this techqnique here:

 

CREATE FUNCTION DateRange ( @start datetime, @end datetime )

RETURNS @DateRange TABLE ( CurrentDate datetime )

AS

BEGIN

      WHILE (@start <= @end)

      BEGIN

            INSERT INTO @DateRange(CurrentDate) VALUES(@start)

            SELECT @start = DATEADD(day,1,@start)

      END

      RETURN

END

GO

Conceptually, what I wanted was this:

SELECT

      IndexInRoster = DatePart(dw,StartDate) + DateDiff(day,StartDate, CurrentDate)

FROM Schedules, DateRange(StartDate,EndDate)

Unfortantely, DateRange() is a table valued function, and what this query ask from SQL Server is to join each row in the Schedules table to another table. This is not possible, of course.

I settled on faking it using this appraoch:

WITH AllDatesInTable(CurrentDate) AS

(

        SELECT CurrentDate FROM dbo.DateRange(

                  (SELECT MIN(StartDate) FROM Schedules),

                        (SELECT Max(EndDate) FROM Schedules) )

)

SELECT

      TestIndexInRoster = DatePart(dw,StartDate) + DateDiff(day,StartDate, CurrentDate)

FROM Schedules JOIN AllDatesInTable

ON CurrentDate BETWEEN StartDate AND EndDate

This query uses Common Table Expression to define a table that has all the dates in the Schedules table. Notice that I constrained it to all the dates in the current row. In essense, this give me a row per each date in the date range of each row. This is the basis of solving this problem.

The other issue is the wrapping of the day index in the roster. This is a bit complicated because we need to take into account three things. SQL Server string handling is 1 base, not 0 based (argh!), we are shifting based on the start date functionality, and we need to wrap around correctly. In order to handle this issue I created this function:

CREATE FUNCTION IndexInRoster(@StartDate DATETIME, @CurrentDate DATETIME, @RosterLen INT)

RETURNS INT AS

BEGIN

      DECLARE @Result int

      SET @Result = (DATEDIFF(day,@StartDate,@CurrentDate) + DATEPART(dw,@StartDate)) % (@RosterLen)

      IF @Result = 0

            RETURN @RosterLen

     

      RETURN @Result

END

GO

The check for @Result equals 0 is there because SQL Server is using 1 based string handling.

Brining it all together, we get this:

WITH AllDatesInSchedulesTable(CurrentDate) AS

(

        SELECT CurrentDate FROM dbo.DateRange(

                  (SELECT MIN(StartDate) FROM Schedules),

                        (SELECT Max(EndDate) FROM Schedules) )

)

SELECT

        EmpName,

        CONVERT(NVARCHAR, CurrentDate,103) Date,

        HasWorked = CASE SUBSTRING(Roster,

                dbo.IndexInRoster(StartDate,CurrentDate,LEN(Roster)), 1)

                        WHEN '_' THEN 0

                        WHEN '*' THEN 1

                        ELSE NULL

            END

FROM Schedules JOIN AllDatesInSchedulesTable

ON CurrentDate BETWEEN StartDate AND EndDate

And the result of this query:

EmpName Date HasWorked
Bob 12/06/2006 1
Mary 12/06/2006 0
Bob 13/06/2006 0
Mary 13/06/2006 1
Bob 14/06/2006 0
Mary 14/06/2006 0
Bob 15/06/2006 0
Mary 15/06/2006 0
Bob 16/06/2006 1
Mary 16/06/2006 1
Bob 17/06/2006 1
Mary 17/06/2006 0
Bob 18/06/2006 0
Mary 18/06/2006 1

And from here it is trivial to get to whatever format you want.

time to read 5 min | 998 words

There is another holy war converging about the SP vs. Parameterized SQL  topic. I got into one not long ago, and I wouldn't get into this one if there wasn't some new stuff brewing in it.

Eric Wise started this with a The Pragmatic Adhoc SQL vs Stored Procedures Discussion, Jeremy D. Miller responded with Why I do not use Stored Procedures and then Eric posted a new post, this time with In Response To No Stored Procedures which has some new questions that I have not seen before.

1.       About the security, naturally, security is a multi layered approach. In the database, I would grant permissions to views or tables according to the user's need. Granting a SELECT permissions does not equals to letting the user SA access to the database, after all. I also tend to add another layer in the application level, which is completely transparent to my code and my business objects, you can see a high level overview here. Then there is the application logic itself.

2.  Transactional  Database Structures : A good coverage of the topic can be found in Roy 's What payroll system are a pain to design – Temporal data  and in Eric's "Transactional" Database Structures.  I am currently working on a big payroll system which is using NHibernate, and this issue is a major PITA. The problem is that both the relations between objects and the objects themselves are changed as time goes by.  We solved this issue by deciding that each object is the system is composed of Id, which cannot be changed, and a history, which contains immutable version of the data for a specific date. A highly simplified version of this will look like this:

This way, I can ask any object for a view of how it was in a point in time. The views also handle the associations, so if I ask an employee to see its view in 2000-12-31, I can easily get any data that I need without dealing with the complex timed model. For instance, if I want to check if the employee had a  car in a certain date, I can use this code:

EmployeeView employeeView = emp.For(new DateTime(2000,12,31));

if(employeeView.BenefitPackage.HasCar)

    Console.WriteLine("The employee {0} had a car in {1}",
                    employeeView.Name, employeeView.Date);


This model translate very well into something NHibernate can handle. It reduce a lot of the problem that you have when you deal with such system, since  the infrastructure code remove a lot of the  problems that you get in such models. To set a new property, I ask the employee to give me its current status, change whatever I want, and then I give it back and saves. A new record is inserted to the database, and the employee's current state moves to the newest piece of data.

3.       OR/M in complex databases structures. The system above gets its schema from a main frame 30 years old or more, written by a bunch of guys that thought that transliteration Hebrew in English is fun (with spelling mistakes!). Enough said, I think.

4.       Stored Procedures makes TDD slower. The problem with SP and  TDD is that you don't get any seam to verify the expected behavior. I may have  a procedure that takes a long time to run, and I need to call it from another procedure that eventually calls that procedure, I'm in more than a bit of a bind. I can replace the long running procedure on the fly, and spend the rest of my life running from angry DBAs, but that about all I can do. There is no way to put mocks in the database. And database access takes time. Often a lot of time.

 

 

 

Note about security: If an attacker is in the position where he can run arbitrary code in your system, you are screwed no matter what. OR/M, ADO.Net or SP, he can make the appropriate calls to do whatever he wants.

 

Oh, and unrelated tidbit, in the last six months I wrote over 600 stored procedures (data warehousing project), so I feel confident that I know my stored procedures.

time to read 1 min | 98 words

I will give some wierder problem next time, Scott solved the problem (the simple one and the harder) in a really nice way. What more, he posted about how he solved the problem, and this is great.

I often don't bother to explicitly think about the step to a solution, so it was a pleasant surprise to see how he got to the answer, not to mention that anyone reading it gains some valuable insight about how to solve those kind of problems.

Great job Scott!

time to read 15 min | 2938 words

FromDate ToDate
01/01/2000 31/01/2000
01/02/2000 28/02/2000
05/03/2000 31/03/2000

I have the following table, what must I do to get to this one:

FromDate ToDate
01/01/2000 28/02/2000
05/03/2000 31/03/2000

Basically, the first two date ranges were merge into a single date range because they were adjunct to each other.

Update: It looks like I barely get a chance to post a problem before it is solved. Let try increasing the complexity of this one. Good job, Scott.

Now I want to get all the inclusive date range. So for this data:

FromDate ToDate
01/01/2000 31/01/2000
01/02/2000 28/02/2000
05/03/2000 31/03/2000
01/02/2000 15/02/2000
15/03/2000 15/04/2000

And the result should be:

FromDate ToDate
01/01/2000 28/02/2000
05/03/2000 15/04/2000
time to read 20 min | 3895 words

Chris Bilson give a nearly complete answer of how to find the valid / invalid commands for an inmate in the comments of the post. It is a very elegant solution, in my opinion. I took upon myself to improve his solution a bit. The result is this query:

SELECT    

       inmate.FirstName,

       inmate.Surname,

       command.IssuedAt AS Start,

       dbo.CommandEnds(command.CommandType, command.ValidFor, command.IssuedAt) AS [End],

       nextCommand.IssuedAt AS StartOfNextCommand,

       CASE

       WHEN DateDiff(ms,

              dbo.CommandEnds(command.CommandType, command.ValidFor, command.IssuedAt) ,

              nextCommand.IssuedAt) > 0 THEN 'Gap In Commands!'

       WHEN getdate() > dbo.CommandEnds(command.CommandType, command.ValidFor, command.IssuedAt)

              AND  nextCommand.Id is null THEN 'Command ended!'

       ELSE 'Fine' END AS Status

FROM dbo.Commands command INNER JOIN dbo.Inmates inmate

              ON     command.InmateID = inmate.ID

       LEFT OUTER JOIN dbo.Commands nextCommand

              ON     nextCommand.InmateID = inmate.Id

              AND nextCommand.IssuedAt = (SELECT MIN(IssuedAt)

                           FROM dbo.Commands c3

                           WHERE c3.InmateId = inmate.Id

                           AND c3.IssuedAt >  command.IssuedAt )
ORDER BY inmate.Id, Start, [End]

The nice part about this solution that how the query match a command with the (logical) next command. Here is the result of the query:

FirstName Surname Start End StartOfNextCommand Status
Bad Man 2/5/2006 6:39 PM 10/5/2006 6:39 PM 10/5/2006 6:40 PM Gap In Commands!
Bad Man 10/5/2006 6:40 PM 11/5/2006 6:40 PM

NULL

Fine
Nasty Guy 3/5/2006 12:00 AM 8/5/2006 12:00 AM 7/5/2006 5:00 PM Fine
Nasty Guy 7/5/2006 5:00 PM 9/5/2006 5:00 PM

NULL

Fine

I wish that I had something similar when I needed to do this kind of software when I needed it.

time to read 4 min | 601 words


No, I'm not abandoning NHibernate to greener pastors, I'm talking about this post.
I'm not sure what the point of the author was, but on first reading, it read like FUD to me. Maybe it is the general tone of the post, or that there are a lot of numbers without any clue how the author got them, but that was my first feeling. A lot of the points that are raised are valid, but I just don't understand where the numbers came from.
I had a math teacher that used to look at a student excersize and say: "And suddenly there is a QED." when we did something stupid. I fill like saying it here.

Take for instance the part about entities and methods. I present you my most common method of dealing with O/RM:

public interface IRepository<T>
{
    void Save(T item);
    void Delete(T item);
    ICollection<T> Find(params Predicate[] preds);
    T Load(int id);
    T TryLoad(int id);
}

That is it. The implementation is generic as well, so I go five methods, not matter how many entities I have. In theory I can switch from O/RM to DAO to SOA to AOM (Acronym Of Month) with very little effort. I would replace the implementation of the IRepository<T>, and I'm done. Five methods, about 2.5 hours if I understand the math correctly, we'll add a couple of days to be safe, but we are still under 20 hours.  I don't even need to touch the UI since it works against the interface, and not the implementation.
The problem is that this math is that it is based on nothing in particular. I recently had to evaluate moving from one O/RM to another, and I had to survey the number of places where I was dependant on the O/RM. I counted 23 major features that I had to have if I didn't want to basically re-write my application from scratch.
Those ranged from lazy loading to cascades to reference equality for same entities in the same session, etc.
Replacing the O/RM would've have been hard at best, impossible at worst, and painful whatever I would have done. And I used a very similar interface to isolate myself from the O/RM (even though I never had any intention of replacing it).

Frankly, the issue is not the support of interface, but far smaller interactions. From knowing that you can map the same fields in a class to different properties in a hierarchy (an integer in one case, an object in the other, etc) is very powerful tool that will cause quite a bit of pain if you need to stop using it. That is where the cost of switching O/RM will come into play.
If you are doing anything at all interesting with your O/RM of choice (or hand written DAL), then you are going to get into some serious trouble if you think about switching.

For instance, I'm using NHibernate's Table Hierarchy to save strategies to the database, so when I load the object later, I'll get all its saved strategies with their parameters. It is making my life so much easier, but I don't want to even think what I would have to do if I had to write it all on my own.
To conclude, I agree with the first two observations, and disagree with the third. You can certainly optimize a good O/RM as much as you like. Usually you don't even need to see the O/RM code for this.
I really don't agree with his conclustions, though.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Production Postmortem (52):
    07 Apr 2025 - The race condition in the interlock
  2. RavenDB (13):
    02 Apr 2025 - .NET Aspire integration
  3. RavenDB 7.1 (6):
    18 Mar 2025 - One IO Ring to rule them all
  4. RavenDB 7.0 Released (4):
    07 Mar 2025 - Moving to NLog
  5. Challenge (77):
    03 Feb 2025 - Giving file system developer ulcer
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}