Building a Talos Kubernetes Cluster from Scratch
Building a Kubernetes cluster from scratch on bare metal is one of those projects that teaches you more about infrastructure than any managed service ever could. This is the story of K8S-CLUSTER — a 6-node Talos Linux cluster running on mini PCs, with full disk encryption, Cilium eBPF networking, and Longhorn distributed storage.
Why Talos?
Talos Linux is a minimal, immutable OS purpose-built for Kubernetes. There's no SSH, no shell, no package manager — everything is managed through a declarative API. This makes it ideal for a homelab where you want production-grade infrastructure without the maintenance burden of traditional Linux nodes.
The tradeoff is steep: you can't just apt install something when you need it. Every system extension must be baked into the installer image at provision time via Image Factory. But what you get in return is a cluster that's reproducible, auditable, and resistant to configuration drift.
Cluster Architecture
The cluster runs on 6 bare-metal mini PCs, all acting as combined control-plane and worker nodes:
| Node | IP | CPU | RAM | Role |
|---|---|---|---|---|
| node01 | 10.x.x.16 | 4C | 8GB | CP + Worker |
| node02 | 10.x.x.17 | 6C | 8GB | CP + Worker + Storage |
| node03 | 10.x.x.18 | 4C | 16GB | CP + Worker + Storage |
| node04 | 10.x.x.19 | 4C | 16GB | CP + Worker + Storage |
| node05 | 10.x.x.20 | 4C | 16GB | CP + Worker + Storage |
| node06 | 10.x.x.21 | 4C | 16GB | CP + Worker + Storage |
A virtual IP at 10.x.x.10 floats across control plane nodes, providing a stable API endpoint without an external load balancer. KubePrism handles local HA on port 7445.
Config Architecture: Base + Patches
One of the most important design decisions was separating the machine configuration into composable layers:
- controlplane.yaml — Cluster-wide settings: API server config, PKI certificates, audit policy, PodSecurity admission, kubelet settings. Contains zero node-specific configuration.
- Per-node patches — Each node gets its own patch with hostname, static IP, VIP assignment, and disk layout.
- Feature patches — Cilium CNI, LUKS encryption, Longhorn extensions, and control-plane scheduling are each separate patches.
The apply command stacks them:
talosctl apply-config --insecure --nodes 10.x.x.16 \
--file controlplane.yaml \
--config-patch @patches/k8s-node01.yaml \
--config-patch @patches/cilium-cni.yaml \
--config-patch @patches/allow-scheduling-cp.yaml \
--config-patch @patches/disk-encryption.yaml \
--config-patch @patches/longhorn-extensions.yamlThis separation means adding a new feature (like disk encryption) is a single patch applied to all nodes, and per-node hardware differences (interface names, disk paths) are isolated to their own files. No merge conflicts, no configuration drift.
LUKS2 Full Disk Encryption
Both the STATE partition (Talos OS state) and EPHEMERAL partition (container runtime data) are encrypted with LUKS2:
apiVersion: v1alpha1
kind: VolumeConfig
name: STATE
encryption:
provider: luks2
keys:
- nodeID: {}
slot: 0
---
apiVersion: v1alpha1
kind: VolumeConfig
name: EPHEMERAL
encryption:
provider: luks2
keys:
- nodeID: {}
slot: 0The nodeID key type derives the encryption key from the machine's unique identifier — no manual key management, no key distribution problem. Each node's encrypted volumes appear as /dev/dm-0 and /dev/dm-1. Performance impact on NVMe is negligible.
Cilium eBPF Networking
The cluster uses Cilium v1.19.1 as the CNI with full kube-proxy replacement via eBPF. The base config sets:
cluster:
network:
cni:
name: none # Talos bootstraps without CNI; Cilium installed via Helm post-bootstrap
proxy:
disabled: true # Cilium replaces kube-proxy entirelyCilium is deployed post-bootstrap via Helm with L2 load balancing, Hubble observability, and 2 operator replicas for HA. The eBPF dataplane handles all packet forwarding at the kernel level — no iptables rules to manage or debug.
One critical gotcha: Cilium's eBPF kube-proxy replacement breaks Tailscale Service annotations. The eBPF intercepts ClusterIP DNAT at the traffic control layer, causing asymmetric routing that confuses Tailscale's proxy. The fix is to use Tailscale Ingress resources (L7) instead of Service annotations (L4). More on this in a future post.
Longhorn Distributed Storage
Persistent storage runs on Longhorn v1.8.1 with 3-way replication across 5 storage nodes (node02-06). Each storage node has a SATA SSD mounted at /var/lib/longhorn via Talos' machine.disks config.
The raw capacity is ~1.8 TB (1x 1TB + 4x 200GB), giving ~600 GB usable with 3-way replication. Longhorn is set as the default StorageClass so all PVC requests automatically get distributed, replicated storage.
Longhorn requires iSCSI tools to manage block devices. Since Talos has no package manager, these are baked into the installer image via Image Factory:
machine:
install:
image: factory.talos.dev/installer/your-schematic-hash....:v1.12.5That schematic hash includes iscsi-tools and util-linux-tools. If you need to add or remove extensions, you generate a new schematic and re-provision.
Security Hardening
The cluster enforces multiple security layers:
- PodSecurity Admission — Baseline enforcement cluster-wide, with exemptions only for
kube-systemandlonghorn-system - Seccomp — RuntimeDefault profile enforced by kubelet for all containers
- Audit Policy — RequestResponse logging for secrets and RBAC writes, Metadata for everything else
- CiliumNetworkPolicy — Per-namespace egress/ingress rules (default deny)
- Encrypted etcd backups — Daily snapshots encrypted with age, 30-day retention
The audit policy is deliberately scoped: full request/response for sensitive operations (secrets, RBAC bindings), metadata-only for everything else. This keeps etcd write pressure low while capturing the events that matter for security forensics.
etcd Backup Strategy
A cron job on the management host runs daily at 2 AM:
talosctl etcd snapshot etcd-backup.snapshot
age -r age1xx... -o etcd-backup.snapshot.age etcd-backup.snapshot
rm etcd-backup.snapshot
find backups/ -name "*.age" -mtime +30 -deleteSnapshots are taken from node01 (the first control plane), encrypted with age, and stored locally with 30-day retention. Recovery requires the private key stored in .age-key.txt (not in git, not on any node).
Lessons Learned
After building and migrating this cluster (including a full subnet migration from 192.168.x.0/24 to 10.x.x.0/24), here are the gotchas worth knowing:
- Old disk signatures block partitioning — LVM/bluestore signatures on secondary disks prevent Talos from partitioning them. Wipe first with a privileged pod:
dd if=/dev/zero of=/dev/sda bs=1M count=10 - Device names shift after wipe — A disk at
/dev/sdbcan become/dev/sdaafter removing LVM device mapper. Standardize your config after cleanup. - etcd doesn't auto-update peer URLs on IP change — Subnet migration requires removing the etcd member, resetting STATE+EPHEMERAL, and re-applying config in maintenance mode.
- Image Factory schematics are immutable — Need a new extension? New schematic hash, new installer image, full node re-provision.
- HostnameConfig overrides machine.network.hostname — If both exist, HostnameConfig wins silently. Remove it from your base config.
- Regenerate kubeconfig after reboot — Talos rotates certificates on boot. Run
talosctl kubeconfig --forceto stay current. - Single-node etcd reset needs --graceful=false — Without it, etcd hangs waiting for quorum that doesn't exist.
What's Running
The cluster currently hosts a mix of web applications, databases, internal tools, and a full monitoring stack. All services are exposed via Tailscale Ingress for private access, with selected services also available publicly through Cloudflare Tunnel.
Total resource usage is modest: ~2.1 CPU cores requested, ~2.2 GB RAM requested, ~30 GB storage across all workloads. There's plenty of headroom for growth on this 28-core, 72 GB cluster.