Homelab - planning infrastructure

Before buying any hardware or deploying anything, one should consider preparing general overview of the planned infrastructure. These plans generally should start from the definition of the required and/or desired functionalities. Following that step, one should choose software you want to run. Check available methods of deployments of each app. Then simple calculations must be done. How man users will be using your applications? Which software is the most important for the users? How much RAM and CPUs will be minimal and optimal? Last, but not least - how much can you spend? Finally one should decide if they want to use containers, virtualisation, or maybe pure installation.

In the following part I will cover each and every step given before. I will also say a little bit about software that can be useful for planning and documenting infrastructure.

Documentation tools

Choosing a proper software to document your infrastructure is not the crucial step, but can safe lots of time. Personally I’m the guy who loves code. I admire plain text. This caused me to choose some of following tools:

Usage	Name
Graphs	Plantuml
Documentation site	Sphinx
Network documentation	Netbox

PlantUML is a tool that uses simple syntax to generate graphs. It can be easily styled and customized. For example the following code

@startuml
left to right direction
!include https://gist.githubusercontent.com/Dzordzu/229fcfe24ea53505cab0ee930176379d/raw/c4548bebc832c124655acf9201ee32f5421b48a5/cyborgy-theme.puml
!include <logos/debian>
!include <logos/centos>
rectangle "10.0.2.11\n\n<$debian>\n\n**node01**" as n01
rectangle "10.0.2.12\n\n<$centos>\n\n**node02**" as n02
n01 --> n02
@enduml

Generates the following image

Sphinx is commonly used with python projects, but with extensions as sphinx-markdown-tables or myst-parser can be an extremely powerful tool to generate documentation from simple markdown files. It is able to generate HTML site, single HTML site and even latex document.

Netbox is an opensource software that can gather info about your devices, networking and cables. It may be useful if you can spare some resources on it.

Defining functionalities

The very first you should do is defining a set of the required and desired functionalities. Below you can find questions I’ve been asking myself before infrastructure updates. I’ve also gathered them into a few categories.

Cloud storage

How performant the storage should be?
How fault tolerant the storage is required to be?
Do I want to be able to mount my storage to the filesystem?
Do I want to integrate my cloud storage with other services?
Do I want any special Access Control List?
Is SSO/LDAP required?
Will I be using S3 protocol?
Will I attach and/or connect to my storage from the home devices?
Do I want to synchronize all of my devices?

Code

Do I really need my private git repository?
Do I want to use containers?
Where do I want to store my docker images and/or language specific artifacts. What kind of languages will I use?
Is CI/CD process desired to be performed within my infrastructure?

Communication

Do I want to have my own communicator?
Will people (family, my small company) be willing to use the communicator?
Do I want to integrate my communicator with other solutions?

Multimedia

What kind of multimedia files I want to store the most?
Do I need metadata about my photos/videos?
How do I want to share my multimedia files?
Should (some of) my files be publicly available?
Do I want authorized users to play multimedia files (self-hosted netflix)
Should users be able to edit excel/word/presentations files online?

Misc

Do I have any special hobbies? Do I want to set up any “helper” for them?
Do I want single login method for all of my applications?
Do I want to synchronize my calendars?
Are there any appointments I want people to fill for me?
Am I willing to make an attempt to reduce google minor services like forms/surveys?
Do I want private PasteBin?

Choosing software

Choosing software is one of the most important actions within homelab building process. This step is the foundation for the choice of the deployment method and hardware.

During selection you should consider the following aspects:

Does the software meet your required functionalities? If no - just follow to the next solution. In case it fulfils only some part of the requirements one can look for software that does the missing part and integrates with the considered software.
How easy is the installation process? If there is no docker (OCI compatible) image or package or chef/puppet/ansible playbook you should be worried.
Is the software free, or at least opensource? I strongly recommend using solutions that fully follow the idea of fully open code base
Does the software follow UNIX philosophy? If not this can indicate the future issues with a certain set of features. For example nextcloud is fine, but it’s Swiss-knife approach can cause loads of issues when the nextcloud update is corrupted.
How actively is the software maintained
What kind of database is used by the software?
- In case of SQL I strongly recommend software that supports PostgreSQL, as MariaDB and SQLite are more error prone, less stable and less scalable.
- In case of graph databases I strongly recommend software that is based on Neo4j or Apache AGE
- In case of typical NoSQL - the software that is not using MongoDB SHOULD, but DOESN’T HAVE TO be rethought
- In case of time based databases - I strongly recommend solutions that allow use of InfluxDB instead of the (for example) Graphite

Defining system requirements

After selecting the software, you need to calculate resources you need. The biggest bottleneck (from my experience) is CPU, or if you need it - GPU. Most of the time RAM will not even reach 90% of the usage.

You may also need to decide how many nodes you want to deploy your software. A rule is simple - the more nodes, the better for your high availability, but more power consumption.

Selecting method of the deployment

The last part covers selecting way of deploying your software. For smaller scale I personally suggest simple docker-compose. Then I recommend adding a separate storage. Then, when you’ve got time and need to load balance and scale your solutions and services - kubernetes. Personally I do not like docker-swarm, but it is fine for medium size scale. Remember not to rely solely on the solutions like etcd. It may be better to find solutions that can be easily integrated outside your environment

My setup

---
required_functionalities:
   - VPN
   - Git repository
   - CICD toolkit (no jenkins - I've got bad memories with it)
   - Private communicator
   - Private docker image repository
   - Private python repository
   - Google drive alternative
   - Software for storing photographs
   - Federated login
   - Possibilty to change password
   - KV secrets for services

optional_functionalities:
   - Software for network documentation
   - Software for devices documentation
   - SSO login

---
optional_services:
   auth:
      - privatebin
      - keycloak
   cloud:
      - netbox

required_services:
   code:
      - gitlab
      - nexus
   cloud:
      - nextcloud
      - minio (s3)
      - photoprism
      - synapse + element
   auth:
      - openldap
      - vault
      - passwd
      - openvpn

---

homelab_name: aio

requirements:
   separate_storage: true
   backups: true
   dynamic_dns: true
   ram: 16GB
   cores: 12

Documentation tools#

Defining functionalities#

Cloud storage#

Code#

Communication#

Multimedia#

Misc#

Choosing software#

Defining system requirements#

Selecting method of the deployment#

My setup#