Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 1 min | 103 words

To celebrate the new year, I decided to offer a single day coupon that will get you 35% discount for all my profiler products.

The coupon code is valid until the 1st of January 2011, but I won’t mention in which timezone, so you might want to hurry up.

The coupon code is:

HNY-45K2D465DD

You can use it to buy:

Happy new year!

time to read 1 min | 124 words

There have been some changes, and it seems that it is hard to track them. Here are where you can find the master repositories for the rhino tools projects:

We are hiring!

time to read 2 min | 368 words

In 2005, I got out of the army, and looked for a job. I felt that I had all the necessary qualifications (at the time, I had already written Rhino Mocks), but I run into an interesting problem. I didn’t have any commercial experience. I am sure that you are familiar with the tale. All the jobs require at least 2 – 3 years experience, and the list of qualification looked like someone just raided the TLA storage facilities.

That was pretty annoying at the time, and I wished I could do something about it. Now I can.

We aren’t hiring additional full time employees (at least not right now), but we are hiring interns. The main idea is to enable people to gain commercial experience while working on cool software.

Some information that you probably need to know:

  • You have to know to program in C#. If you have OSS credentials, that is all for the better, but it is not required.
  • This is a paid position, you are not expected to work for free.
  • It is not a remote position, our offices are located in the Southern Industrial Area in Hadera, Israel.
  • I’m not going to ask you to do all the annoying work. There is some annoying stuff to do (mostly docs), but look below to see what exactly I have in mind.

This position entails:

Working on our shipping products (the Uber Prof line of products and the Raven DB / Raven MQ line).

Among other things, I have earmarked the following features for you:

  • Raven DB’s query optimizer.
  • Raven DB’s auto sharding & scaling.
  • Raven DB’s clustering support.
  • Uber Prof’s production profiling feature.

The other things that are involved include all the usual stuff, bug fixes, doc writing and standard (in other words, not that exciting) features.

Duration:

This is a three to six months position.

What is required:

Commercial experience is not required, but I would want to see code that you wrote.

time to read 4 min | 769 words

Originally posted at 12/15/2010

Yesterday I asked what is wrong with the following code:

public ISet<string> GetTerms(string index, string field)
{
    if(field == null) throw new ArgumentNullException("field");
    if(index == null) throw new ArgumentNullException("index");
    
    var result = new HashSet<string>();
    var currentIndexSearcher = database.IndexStorage.GetCurrentIndexSearcher(index);
    IndexSearcher searcher;
    using(currentIndexSearcher.Use(out searcher))
    {
        var termEnum = searcher.GetIndexReader().Terms(new Term(field));
        while (field.Equals(termEnum.Term().Field()))
        {
           result.Add(termEnum.Term().Text());

            if (termEnum.Next() == false)
                break;
        }
    }

    return result;
}

The answer to that is quite simple, this code doesn’t have any paging available. What this means is if we executes this piece of code on an field with very high number of unique items (such as, for example, email addresses), we would return all the results in one shot. That is, if we can actually fit all of them to memory. Anything that can run over potentially unbounded result set should have paging as part of its basic API.

This is not optional.

Here is the correct piece of code:

public ISet<string> GetTerms(string index, string field, string fromValue, int pageSize)
{
    if(field == null) throw new ArgumentNullException("field");
    if(index == null) throw new ArgumentNullException("index");
    
    var result = new HashSet<string>();
    var currentIndexSearcher = database.IndexStorage.GetCurrentIndexSearcher(index);
    IndexSearcher searcher;
    using(currentIndexSearcher.Use(out searcher))
    {
        var termEnum = searcher.GetIndexReader().Terms(new Term(field, fromValue ?? string.Empty));
        if (string.IsNullOrEmpty(fromValue) == false)// need to skip this value
        {
            while(fromValue.Equals(termEnum.Term().Text()))
            {
                if (termEnum.Next() == false)
                    return result;
            }
        }
        while (field.Equals(termEnum.Term().Field()))
        {
            result.Add(termEnum.Term().Text());

            if (result.Count >= pageSize)
                break;

            if (termEnum.Next() == false)
                break;
        }
    }

    return result;
}

And that is quite efficient, even for searching large data sets.

For bonus points, the calling code ensures that pageSize cannot be too big :-)

time to read 2 min | 273 words

This code should never have the chance to go to production, it is horribly broken in a rather subtle way, do you see it?

public ISet<string> GetTerms(string index, string field)
{
    if(field == null) throw new ArgumentNullException("field");
    if(index == null) throw new ArgumentNullException("index");
    
    var result = new HashSet<string>();
    var currentIndexSearcher = database.IndexStorage.GetCurrentIndexSearcher(index);
    IndexSearcher searcher;
    using(currentIndexSearcher.Use(out searcher))
    {
        var termEnum = searcher.GetIndexReader().Terms(new Term(field));
        while (field.Equals(termEnum.Term().Field()))
        {
           result.Add(termEnum.Term().Text());

            if (termEnum.Next() == false)
                break;
        }
    }

    return result;
}

As usual, I’ll post the answer tomorrow.

time to read 4 min | 691 words

Originally posted at 12/15/2010

In the last post, we have shown this code:

public class CloseableThreadLocal
{
    [ThreadStatic]
    public static Dictionary<object, object> slots;

    private readonly object holder = new object();
    private Dictionary<object, object> capturedSlots;

    private Dictionary<object, object> Slots
    {
        get
        {
            if (slots == null)
                slots = new Dictionary<object, object>();
            capturedSlots = slots;
            return slots;
        }
    }


    public /*protected internal*/ virtual Object InitialValue()
    {
        return null;
    }

    public virtual Object Get()
    {
        object val;

        lock (Slots)
        {
            if (Slots.TryGetValue(holder, out val))
            {
                return val;
            }
        }
        val = InitialValue();
        Set(val);
        return val;
    }

    public virtual void Set(object val)
    {
        lock (Slots)
        {
            Slots[holder] = val;
        }
    }

    public virtual void Close()
    {
        GC.SuppressFinalize(this);
        if (capturedSlots != null)
            capturedSlots.Remove(this);
    }

    ~CloseableThreadLocal()
    {
        if (capturedSlots == null)
            return;
        lock (capturedSlots)
            capturedSlots.Remove(holder);
    }
}

And then I asked whatever there are additional things that you may want to do here.

The obvious one is to thing about locking. For one thing, we have now introduced locking for everything. Would that be expensive?

The answer is that probably not. We can expect to have very little contention here, most of the operations are always going to be on the same thread, after all.

I would probably just change this to be a ConcurrentDictionary, though, and remove all explicit locking. And with that, we need to think whatever it would make sense to make this a static variable, rather than a thread static variable.

time to read 4 min | 616 words

Originally posted at 12/15/2010

Well, the problem on our last answer was that we didn’t protect ourselves from multi threaded access to the slots variable. Here is the code with this fixed:

public class CloseableThreadLocal
{
    [ThreadStatic]
    public static Dictionary<object, object> slots;

    private readonly object holder = new object();
    private Dictionary<object, object> capturedSlots;

    private Dictionary<object, object> Slots
    {
        get
        {
            if (slots == null)
                slots = new Dictionary<object, object>();
            capturedSlots = slots;
            return slots;
        }
    }


    public /*protected internal*/ virtual Object InitialValue()
    {
        return null;
    }

    public virtual Object Get()
    {
        object val;

        lock (Slots)
        {
            if (Slots.TryGetValue(holder, out val))
            {
                return val;
            }
        }
        val = InitialValue();
        Set(val);
        return val;
    }

    public virtual void Set(object val)
    {
        lock (Slots)
        {
            Slots[holder] = val;
        }
    }

    public virtual void Close()
    {
        GC.SuppressFinalize(this);
        if (capturedSlots != null)
            capturedSlots.Remove(this);
    }

    ~CloseableThreadLocal()
    {
        if (capturedSlots == null)
            return;
        lock (capturedSlots)
            capturedSlots.Remove(holder);
    }
}

Is this it? Are there still issues that we need to handle?

time to read 3 min | 546 words

Originally posted at 12/15/2010

Last time, we forgot that the slots dictionary is a thread static variable, and that the finalizer is going to run on another thread… Let us fix this, too:

public class CloseableThreadLocal
{
    [ThreadStatic]
    public static Dictionary<object, object> slots;

    private readonly object holder = new object();
    private Dictionary<object, object> capturedSlots;

    private Dictionary<object, object> Slots
    {
        get
        {
            if (slots == null)
                slots = new Dictionary<object, object>();
            capturedSlots = slots;
            return slots;
        }
    }


    public /*protected internal*/ virtual Object InitialValue()
    {
        return null;
    }

    public virtual Object Get()
    {
        object val;

        if (Slots.TryGetValue(holder, out val))
        {
            return val;
        }
        val = InitialValue();
        Set(val);
        return val;
    }

    public virtual void Set(object val)
    {
        Slots[holder] = val;
    }

    public virtual void Close()
    {
        GC.SuppressFinalize(this);
        if (capturedSlots != null)
            capturedSlots.Remove(this);
    }

    ~CloseableThreadLocal()
    {
        if (capturedSlots == null)
            return;
        capturedSlots.Remove(holder);
    }
}

And now it works!

Except… Under some very rare scenarios, it will not do so.

What are those scenarios? Why do we care? And how do we fix this?

time to read 3 min | 476 words

Originally posted at 12/15/2010

Well, last time we tried to introduce a finalizer, but we forgot that we were using our own instance as the key to the slots dictionary, which prevented us from being collected, hence we never run the finalizer.

All problems can be solved by adding an additional level of indirection… instead of using our own instance, let us use a separate instance:

public class CloseableThreadLocal
{
    [ThreadStatic] public static Dictionary<object, object> slots;

    object holder = new object();
    public static Dictionary<object, object> Slots
    {
        get { return slots ?? (slots = new Dictionary<object, object>()); }
    }

    public /*protected internal*/ virtual Object InitialValue()
    {
        return null;
    }

    public virtual Object Get()
    {
        object val;

        if (Slots.TryGetValue(holder, out val))
        {
            return val;
        }
        val = InitialValue();
        Set(val);
        return val;
    }

    public virtual void Set(object val)
    {
        Slots[holder] = val;
    }

    public virtual void Close()
    {
        GC.SuppressFinalize(this);
        if (slots != null)// intentionally using the field here, to avoid creating the instance
            slots.Remove(holder);
    }

    ~CloseableThreadLocal()
    {
        Close();
    }
}

Now it should work, right?

Expect that it doesn’t… We still see that the data isn’t cleaned up properly, even though the finalizer is run.

Why?

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}