Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive

Funding options

time to read 8 min | 1433 words

This is a divergence from my usual discussion on technical stuff. In this post, I want to talk about money. In particular, how you get it from other people. Note that I am neither an expert nor qualified to talk about the subject matter, this post came out of a lot of scribbled notes and is mostly meant to serve as a way to lay down a line of thought. All numbers are made up, and while I would like such a car, it would be mostly to inflict it on the employee of the month.

There are many cases in the lifecycle of a business where you need more cash than you currently have (or are willing to spend outright).

A common scenario is when you start a business, or when you want to expand it. For our discussion, we’ll use the example of the following drool worthy car:

I consider such a piece of art priceless, but let us say that I managed to convince the owner to sell it to me for the nice sum of 1,000,000$.

Unfortunately, I don’t have 1,000,000$. I only have 650,000$. So long, beautiful car, it was very nice to know you, but it is just not possible. Except that there is this thing where people give you a lump sum of money, and you give it back over time (although usually more than you got).

Funding is important for businesses in the same manner that breathing is for people. There are typically several ways to fund a business:

  • Direct cash infusion – That is usually how most business start. The amount of money put into the business depend on what it needs to do. A web developer would need the money buy a laptop and a Starbucks loyalty card, so that is easy. For a restaurant, you need enough money for rent, employees, equipment, etc. The smaller the amount you need to put into the business to kick start it, the easier it is to just use your own saving to do so.
  • Partners – This is pretty much the same as the previous one, but instead of having only one person do that, you have multiple people and more savings to dip into.
  • Angels/Investors – Those are people who for various reasons would give you money. Sometimes this is because they are related to you, but often time it is a calculated move, investing some money in a business in order to get a stake in it and cash it in afterward.
  • Government development loan / grant – Sometimes you can get this, and they usually have both very good terms, and really strict rules, regulation and hops to jump through.
  • Bank / credit loan – Well, you are presumably familiar with that. You get a loan, pay interest, mortgage some assets, etc.
  • Self funded – Your business is making more money than it is spending, therefor you have money to spend on the business.

The best choice is self funding, because that mean that you are profitable and aren’t beholden to someone. The other really depend on personal preferences. Here are mine:

  • Direct cash infusion – That works for starting a business with low starting overhead costs (see, single developer shop). It might also be viable if you have a lot of personal wealth that you can put into the business, but personally, I like to think about the money flowing in the other direction. Otherwise that is an indication that there is something strange going on here.
  • Partners – I used to work at a place that was owned by 7 founding members + 1 “silent partner”. I still remember when the entire company got an email from a co-CEO that was basically: “You are forbidden to discuss project X or anything related to it with the other co-CEO”. That left an… impression, shall we say. Also, this is again something that you would usually do in the beginning. Bringing a partner into an existing business implies one of a few things. You are in a big trouble (either personally or the business) and need cash infusion that you can’t/won’t supply or you are doing really well and people are flocking to join you.
  • Investors/Angels – This is very similar to the previous point, with the caveat that investors usually aren’t going to meddle in the day to day affairs, nor are they going to shoulder any burden. They are there to provide the money, some expertise/networking but that is basically it. They do create a pretty huge amount of bureaucracy, reports, compliance, etc. The investors needs to know that you aren’t blowing away their money, after all.
  • Government development loan / grant – This is pretty much the same as the previous one, only the investor is the government. If you thought that investors generated a lot of paperwork, you were mistaken.

The remaining two options are self funding and getting a loan. Now, assuming that no one else buy this magnificent car, I can put some numbers in Excel and predict that in a couple of years, I’ll have enough money to buy it outright. So all I need to do is ask the owner to not sell it to anyone, hope that my cash flow remain according to projections, hope the price doesn’t change and just wait.

Of course, that means that I can’t crash lift moral by making this the official company car in the meantime. I’m losing quite a lot of amusing moments by waiting, and that is assuming that it is still possible in two years. Of course, if in two years I would have the money to do so, I’m not so sure that I would still want to just purchase it directly. That would mean having no money at all. And that is kinda of scary, because salaries need to be paid, and this car doesn’t look like it has good gas/mileage ratio.

So the option that we have left is taking a loan. The nice thing about doing that is that we can mortgage the actual asset that we are buying, this magnificent car. Now, the bank may not value it as much as I do, so they are going to give it a price of only 900,000$, and then they are going to only agree to fund 80% of that, which gives us 720,000$.

In other words, that means that we need to puny up 380,000$, which is much more reasonable, and leave us with a bit of free cash cushion. That lead to a few interesting observations:

  • The loan amount and the money we already have are comparable. That means that the bank is going to be much nicer to us than if we wanted to borrow much more money than we already have (on the assumption that if we got this amount of money once, we’ll be able to get it again to pay them).
  • There is a valid asset to mortgage, which reduce the loan risk (and thus get us better terms).
  • The current interest environment is at an all times low, which mean that this is a great time to loan money (and bad time to try to save).

This means that this is a much simpler deal than going to a bank with a business plan and hoping that they will believe that we can make it. Now, let us get down to the financial details.

An offer from bank A is for an interest rate of 4%. That gives us a month payment over ten years of 7,290$ per month.

An offer from bank B is for an interest rate of 4.25%. Which gives a monthly payment of 7,375$ per month.

That is a simple number game, and we are pretty much done at this point, right? Almost, but let us project this over 10 years, and see where that put us.

Bank A: Total amount of interest paid is 154,800$

Bank B: Total amount of interest paid is 165,000$

In other words, the total difference is 10,200$. That means that while it is still a numbers game, it isn’t just the interest rate. The reason is that we now need to consider a lot more aspects. For example, Bank B may have an easier loan approval process, or require less security, or value the car higher than bank A. Bank A doesn’t allow early cash out, while bank B does, or a million and one other differences.

The question now becomes is whatever the other stuff beyond the raw interest rate can be quantified, and whatever it is worth more than 10,000$.

As I said earlier in this post, this is mostly settling things in my mind. Feel free to ignore this post all together.

time to read 3 min | 447 words

RavenDB is doing a pretty great job for being a production database, in fact, we have designed it upfront to only have features that make sense to have for robust production systems.

In particular, we don’t have any form of ad-hoc queries. A query always hits an index, so it is very fast. Even what we call dynamic queries in RavenDB are actually creating an index behind the scene. This is pretty awesome for normal production usage, but it does have some limitations when you want to explore the data. This can be because you are a developer trying to find a particular something, and you just want to quickly fire off random queries. You don’t care about the costs, and you don’t want to generate indexes. Or you can be an admin that needs to get a particular report from the system and you want to play around with the details until you get everything right.

In order to serve those needs, RavenDB 3.5 is going to have a really nice feature, explicit data exploration.

For example, let us say that I want to count the number of unique words in all of my posts, I can do it using the following:

image

Note that the actual query is pretty meaningless, and I’m writing this at 1AM with a baby nearby that make funny noises, so the Linq statement there works, but can probably be better.

The point here is that to demo what is going on. We write a simple Linq statement, and can run it against our database, and then gather the results back. It is like having LinqPad directly inside the RavenDB studio. In fact, that is the number one scenario that we envision for this feature, replacing LinqPad usage by having a native capability.

Now, some caveats. As you can see, you can select to limit the query duration as well as the number of documents it will operate on. That give us a quick way to explore the data without putting too much load on the server. You can even take the output here and throw it directly to Excel. “Sam, can you give the a breakdown of orders this year by month and country? Just email me the Excel spreadsheet”.

Note that this is intended as a user feature, it isn’t something that we provide an API for. It is there for admins or developers that are figuring things out, an admin feature, not something that you want to use on production.

time to read 3 min | 569 words

I mentioned that we have created our own thread pool implementation in RavenDB to handle our specific needs. A common scenario that ended up quite costly for us was the notion of parallelizing similar work.

For example, I have 15,000 documents to index .That means that we need to go over each of the documents and apply the indexing function. That is an embarrassingly parallel task. So that is quite easy. One easy way to do that would be to do something like this:

foreach(var doc in docsToIndex)
	ThreadPool.QueueUserWorkItem(()=> IndexFunc(new[]{doc}));

Of course, that generates 15,000 entries for the thread pool, but that is fine.

Except that there is an issue here, we need to do stuff to the result of the indexing. Namely, write them to the index. That means that even though we can parallelize the work, we still have non trivial amount of startup & shutdown costs. Just running the code like this would actually be much slower than running it in single threaded mode.

So, let us try a slightly better method:

foreach(var partition in docsToIndex.Partition(docsToIndex.Length / Environment.ProcessorCount))
	ThreadPool.QueueUserWorkItem(()=> IndexFunc(partition));

If my machine has 8 cores, then this will queue 8 tasks to the thread pool, each indexing just under 2,000 documents. Which is pretty much what we have been doing until now.

Except that this means that we have to incur the startup/shutdown costs a minimum of 8 times.

A better way is here:

ConcurrentQueue<ArraySegment<JsonDocument>> partitions = docsToIndex.Partition(docsToIndex.Length / Environment.ProcessorCount);
for(var i = 0; i < Environment.ProcessorCount; i++) 
{
	ThreadPool.QueueUserWorkItem(()=> {
		ArraySegment<JsonDocument> first;
		if(partitions.TryTake(out first) == false)
			return;

		IndexFunc(Pull(first, partitions));
	});
}

IEnumerable<JsonDocument> Pull(ArraySegment<JsonDocument> first, ConcurrentQueue<ArraySegment<JsonDocument>> partitions )
{
	while(true)
	{
		for(var i = 0; i < first.Count; i++)
			yield return first.Array[i+first.Start];

		if(partitions.TryTake(out first) == false)
			break;
	}
}

Now something interesting is going to happen, we are scheduling 8 tasks, as before, but instead of allocating 8 static partitions, we are saying that when you start running, you’ll get a partition of the data, which you’ll go ahead and process. When you are done with that, you’ll try to get a new partition, in the same context. So you don’t have to worry about new startup/shutdown costs.

Even more interesting, it is quite possible (and common) for those tasks to be done with by the time we end up executing some of them. (All the index is already done but we still have a task for it that didn’t get a chance to run.) In that case we exit early, and incur no costs.

The fun thing about this method is what happens under the load when you have multiple indexes running. In that case, we’ll be running this for each of the indexes. It is quite likely that each core will be running a single index. Some indexes are going to be faster than the others, and complete first, consuming all the documents that they were told to do. That means that the tasks belonging to those indexes will exit early, freeing those cores to run the code relevant for the slower indexes, which hasn’t completed yet.

This gives us dynamic resource allocation. The more costly indexes get to run on more cores, while we don’t have to pay the startup / shutdown costs for the fast indexes.

time to read 3 min | 579 words

With RavenDB 3.5, we are focusing on performance as one of the key features. I’ve already spoken at length about the kind of changes that we had made to improve performance. A few percentage points here and there end up being quite significant when you add them all together.

But just sanding over the rough edges isn’t quite enough for us. We want to have a major impact, not just an avalanche of small improvements. In order to handle that, we needed to be much more aware of how we are making use of resources in the system.

The result of several months of work is now ready to enter into performance testing, and I’m quite excited about it. But before I show you the results, what is it? Well, RavenDB does quite a lot in the background, to avoid holding up a request thread when you are calling RavenDB. This mean that we have a lot of background work, indexing, map/reduce, etc.

We have been using the default .NET thread pool for a long time to do that, and it has served us very well. But it is a generic construct, without awareness of the unique needs that RavenDB has. Therefor, we worked for quite some time to create our own Thread Pool that match what we do.

The major changes with the RavenThreadPool (RTP from now) are:

  • There is a fixed (and dedicated) number of threads that will do the work, sharing (and stealing) work among themselves.
  • Indexes tasks are continuous and shared, so a big indexing work will spread across all threads, but with a preference for locality of work if we have a lot of stuff to parallel.
  • A slow index doesn’t stop us from working on other indexes, we’ll let it process on its own, and let the other indexes run forward without it.
  • Dynamic adjusting of the amount of work that is allowed for the indexes means that under load, we can dynamically and rapidly reduce the amount of work we are doing to allow more resources for processing requests.

There are other stuff, but they are mostly of interest for the people who work on RavenDB, not on those who use it.

And the results, they are pretty good. Here is the before and after sample.

image

Note that we have a mix here of various types of indexes. The X axis is time, and the Y axis is the number of documents indexed.

As you can see, in the before (without RTP), we are processing all indexes roughly on the same course, with a pretty flat growth over time.  However, with RTP, the situation is different. You can see that very quickly the fast indexes are starting to outpace both their version without RTP and the slower indexes.

That, in turn, means that they complete much faster. In the case of the Simple Map index, it complete indexing roughly 50% faster than without RTP. But even more interesting is what happens globally, because we are able to complete indexing of the fast indexing fast, it means that we can process the slow index (HeavyMapReduce) with more resources. So even this slowpoke completes about 15% faster with RTP than without it.

We are still running tests, but even so, this is quite exciting.

Project Tamar

time to read 1 min | 127 words

I’m happy to announce that despite the extreme inefficiencies involved in the process, the performance issues and what are sure to be multiple stop ship bugs in the way the release process is handled. We have successfully completed Project Tamar.

The result showed up as 2.852 Kg bundle, and is currently sleeping peacefully. I understand that this is a temporary condition, and lack of sleep shall enthuse shortly. I’ll probably regret saying that, but I can’t wait.

 

This little girl is named Tamar and to celebrate, I’m offering a 28.52% discount for all our products. This include:

Just use coupon code: Tamar

This will be valid for the next few days.

time to read 1 min | 170 words

We have a new build out, build 3660, which contain a lot of fixes and some really nice stuff for RavenDB users.

You can get it here.

The new stuff is quite cool.

Global

  • Changed default data location (Now will go to C:\Raven).
  • Various performance optimizations across both server and client.

RavenFS

  • Versioning support for RavenFS
  • Better metadata queries for RavenFS
  • Smuggler support for RavenFS

RavenDB

  • Client side indexes will automatically specify sort orders
  • Prevented high CPU and excessive GC runs under low memory conditions
  • Avoid leaking resources when failing to create a database.
  • Faster JSON serialization and deserialization
  • Allow to lock and set index priorities via the client API
  • Added backoff strategy for failing periodic exports
  • Recognize Windows users with admin rights to system database as server admins
  • Facets can now have very large number of facets
  • Replication will now replicate indexes between nodes (and gossip about usage)

We are now aiming at RavenDB 3.5, which should result in some really great stuff showing up.

time to read 2 min | 272 words

We have a support hotline for RavenDB, and usually we get people that give us “good” problems to solve. And then we have the people who… don’t.

The following are a few of the strangest issues from the past month or so.

  • OutOfMemoryException is thrown when running RavenDB in 32 bits mode, and documents of size of 50-70MB size are used.

Solution – When running in 32 bits mode, RavenDB only have 2GB of virtual memory to work with, and that is really not enough to do much. There is no reason today for any server app to run in 32bits mode. Also, a 70MB document?! Seriously!

  • Very slow startup time for RavenDB when the number of indexes approaches 20,000.

Solution – That isn’t a typo, honest. We had a customer that sent us a database that had 19,273 indexes in it. When they restarted the database, we had to load all of those indexes, and that took a… while. And no, those weren’t dynamic indexes, they were static indexes that (I hope, at least) were probably generate by a tool.

  • Index creation can take a long time when the number of map indexes in a multi map index is higher than 150.

Solution – Are you trying to be funny?! What it is that you are doing?

  • Index creation can take a long time when the size of the index definition is greater than 16KB.

Solution – that is a single index definition that goes on for roughly 3,000 lines. You are doing things wrong.

 

What is the worst thing that you have seen?

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}