Disk capacity planning for Neo4J

3 min readJul 7, 2015

Neo4J is a multi-featured graph database, able to store billions of items. This brings up an interesting question: how much space will it take on disk?

Background

A Neo4J database is composed of (or better, stores) the following discreet data items on disk

nodes
relationships
properties string values are stored separately in 128 byte chunks
Lucene indexes over properties
other stuff: house-keeping files, logs,…

I will ignore the size of all other stuff and just focus on how the data itself affects disk consumption.

It is important to note than in a cluster setup all nodes contain a complete copy of the data.

Overheads

Despite Neo’s great documentation, there is no central reference for the overheads and disk sizes. This is understandable to a degree as this is something internal and not a published contract. However, it is still a little bit annoying as one has to go in various locations to find bits and pieces

The following table gives an overview of the various items

Example calculation

Let’s give an example, to illustrate how the numbers above come into play.

Let’s assume that we have a Neo4J database with the following 3 node instances (JSON representation) and 2 relationships with no properties attached to them.

So for this tiny dataset we have

3 node instances
2 relationship instances
24 property “instances” (or better individual property values)
…16 of which are string values
8 index entries

So, the Neo4J database folder storing our model, will be around 3Kb on disk (provided we ignore logs and other housekeeping files)

A calculation template

Let’s consider the following example model to store in Neo4J.

To make things simple, we can choose to implement relational inheritance (in this case User with Employee and Contact) by simply collapsing and adding additional labels. So an employee node will have 2 labels: User and Employee.

We should know (or at least guesstimate) the following

how many and what type of properties each class will have
how many instances of each class (or better, node type) we will have
how many other nodes each one is connected to on average

Once we have this information, we can create an Excel similar to the following

The right table captures how many nodes of each type we have along with the properties per node.

The left table calculates the number of relationships in total based on the averages. So, for example, each Country has on average 50 Employees and each Client has 0.5 Contacts (because we know we have a lot of empty/unused Client entries).

These figures are then summed to calculate the final DB footprint.

You can download the Excel file from this link and use it as a basis for your own calculations. The file contains a VB macro, which recalculates the totals, when you update any cell.

In case you need to add/remove lines to the tables because your model is bigger/smaller, you will need to edit this Macro.

There might be a more elegant way to create this Excel, but I never said I was an Excel guru ;-) You comments and suggestions are welcome.

I hope you find it useful.

Originally published at https://sgerogia.github.io on July 7, 2015.

Disk capacity planning for Neo4J

Background

Overheads

Example calculation

A calculation template

Written by Stelios Gerogiannakis