Commit Graph

631 Commits

Author SHA1 Message Date
e0b385a1c5 Fix accidentally removing a ton of checks from each host 2022-01-03 13:18:14 -06:00
e2d738ba40 Enroll vm-scan-1 in backups 2021-12-30 10:37:50 -06:00
9662bb0ff8 Ensure we actually add our new user to sudoers 2021-12-30 10:12:44 -06:00
111f1cdef6 Configure a scanner user on all machines 2021-12-30 10:04:31 -06:00
1cff8a6aa8 Deploy GVM to a box at home 2021-12-30 09:34:45 -06:00
ed64fc0a9a Backup etc on desktops 2021-12-30 07:48:15 -06:00
db78f7eaf6 Separate HOSTALIAS from SERVICEDESC more properly on Nagios 2021-12-26 22:56:03 -06:00
8612eec85e Create an Ansible contactgroup in Nagios and tie it to all alerts, enroll our Matrix user in that group 2021-12-24 16:56:06 -06:00
1791c40465 Working on Matrix integration for Nagios 2021-12-24 16:47:21 -06:00
c6c57fce6c Change alert destination email address 2021-12-24 15:36:18 -06:00
09f33966ac Disable memory checks for machines running ZFS
I give up. I'll circle back on this later
2021-12-24 15:32:53 -06:00
aa493348d3 Add another Minecraft server and some related checks 2021-12-24 14:53:09 -06:00
22863e66e7 Upgrade Nextcloud 22 -> 23 2021-12-24 12:16:44 -06:00
d7c3f97797 Set up ddclient 2021-12-21 11:16:40 -06:00
5e7b8bb881 Add a Minecraft server *at home* 2021-12-20 17:24:11 -06:00
9b64cf8a00 Modularize sanitization cronjobs 2021-12-16 08:11:17 -06:00
6b218b02f9 Add a cronjob to Syncthing to clean up :Zone.Identifier files 2021-12-16 07:44:01 -06:00
911d236c84 Implement a sanitize rule for syncthing 2021-12-15 22:19:05 -06:00
060aa14df3 Fix incorrect dir for cp2077 screenshots 2021-12-15 21:12:24 -06:00
e93124e556 Add more directories to sort out, make the jobs run in parallel at 5AM 2021-12-15 21:10:20 -06:00
58196e3f24 Genericize that cronjob syntax for future endeavors 2021-12-15 20:56:45 -06:00
640e2e0efe Add a cronjob for a specific bug I'm working around with Syncthing 2021-12-15 20:41:14 -06:00
5031833f39 Remove Package Updates check
It's just pointless noise to be honest, it's way too loud. Perhaps a proper patch management solution would be in order?
2021-12-15 20:06:12 -06:00
72697a3953 Move check_disk to those restricted checks, also exclude AppImage loopback mounts 2021-12-15 19:57:20 -06:00
54a4f1539b Add some sudo rules to nagios-checker so it can start doing restricted checks 2021-12-15 19:57:08 -06:00
4b626dc6be Implement communication with Nagios when rebooting boxes
One step closer to that full automation goal
2021-12-15 19:32:19 -06:00
000d711d7a Update gulagbot DB IP 2021-12-12 09:54:15 -06:00
31018efeb1 Expose Jellyfin over 192.168.* 2021-12-11 21:50:20 -06:00
e0ce07c4dc Restart Jellyfin unless stopped 2021-12-11 16:01:21 -06:00
9aab2d6557 Tune the transaction limits for that check we just added 2021-12-09 16:23:59 -06:00
273d83be64 Add a check for old, uncommited PostgreSQL transactions
Sometimes reading the blogs of developers whose software you use is worth it
2021-12-09 16:17:49 -06:00
fcffd834a0 Move Nagios into its own role
It was getting way too big
2021-12-08 21:34:32 -06:00
a71071b321 Spin up a SL server 2021-12-01 22:34:46 -06:00
386b190130 Add vm-desktop-1 to list of workstations 2021-12-01 07:31:38 -06:00
e85d81ef38 Drop logs for lr.cowfee.moe 2021-11-30 14:11:27 -06:00
558709ce6f Deploy libreddit 2021-11-29 23:33:56 -06:00
fab7be68c5 Tune thresholds for monitoring the age of ansible-last-run 2021-11-29 22:15:09 -06:00
1952f72c89 Add a check for the last ansible run on a given machine 2021-11-25 16:41:17 -06:00
5b12eb5af2 Add a cleanup task to touch a file upon completion of site.yml
This playbook *should* assure that we have a file we can use for checking when the last full play was. It being in a playbook at the tail end of site.yml is paramount, since site.yml dying will cause alarms to be set off.
2021-11-25 16:19:01 -06:00
ce37a7fec3 Rename a bunch of minecraft tasks to prevent ambiguity 2021-11-25 13:30:21 -06:00
6c4b1c701b Fix some unquoted number variables causing the gulagbot task to fail 2021-11-25 13:20:23 -06:00
84cd7888f1 Shut down hexxit2 2021-11-25 13:19:45 -06:00
12f33d9ffc Put Syncthing behind an ingress container 2021-11-24 10:43:29 -06:00
3d9ec54467 Nevermind, guess we scope it out even *higher* 2021-11-24 07:43:18 -06:00
733d1006be Adjust mountpoint for NFS to allow for access to higher dirs like syncthing 2021-11-24 07:41:05 -06:00
811d0bd2d2 Add some params for a new version of gulagbot 2021-11-22 14:12:13 -06:00
decd4b452b Add transaction logging to home DB 2021-11-21 19:57:33 -06:00
1c4bf65db4 Add a test DB for gulagbot 2021-11-21 17:01:58 -06:00
d67bc370ac Split nagios-checkhttp into nagios-checkhttp{,s} 2021-11-21 13:59:05 -06:00
7976630ad7 Add a box for Syncthing 2021-11-20 18:29:31 -06:00
ee5f8ffe92 Make a box to move Stalin back home where he belongs 2021-11-19 20:38:20 -06:00
cff68a2a73 Reorganize Jellyfin to reflect usage of Nvidia Container Toolkit, configure hw accel 2021-11-18 19:16:30 -06:00
05a7f19bfe Update backup dirs for PMX to reflect some PCIe passthrough stuff I'm doing 2021-11-18 16:51:32 -06:00
9680705689 Update NFS mounts for desktops 2021-11-15 11:07:11 -06:00
bca5c1993d Add a bunch of unit checks for Proxmox boxes 2021-11-11 14:00:25 -06:00
5794379da4 Add some backup configuration for PMX hosts 2021-11-11 13:54:11 -06:00
77084ebc49 Reorganize media dirs, add jellyfin to vm-media-1 2021-11-11 00:23:53 -06:00
b0b71abf6a Reorganize NFS mounts so that they don't contain SQLite DBs 2021-11-10 19:29:06 -06:00
57b1cf03ca Set nfs to rw on local connections 2021-11-10 19:14:02 -06:00
7f7a0fd2ba Run setup tasks before roles on vm-media-1 2021-11-10 19:11:10 -06:00
f3b12234c0 Start reorganizing to have pi-media-1 split into vms 2021-11-09 20:53:59 -06:00
cb6581b708 Add home db playbook 2021-11-08 16:44:04 -06:00
8c213fe693 Ensure hexxit2 backups aren't getting tarred in 2021-11-08 10:04:15 -06:00
c5d39db270 Actually implement device roles in Nagios 2021-11-07 08:55:05 -06:00
f250936fe9 Disable some relatively standard checks on hypervisors, since they're special 2021-11-07 08:38:27 -06:00
f07cb9e35c Disable docker checks for machines that don't have docker 2021-11-07 08:36:16 -06:00
4efb757c43 Download Hexxit from 9iron 2021-11-07 07:52:33 -06:00
f53726c68a Add lag goggles to hexxy 2021-11-06 16:25:44 -05:00
0edbac0520 Fix typo on no-docker tag 2021-11-06 15:56:35 -05:00
635c8c1bf4 Move motd configuration to Ubuntu machines and only Ubuntu machines 2021-11-06 15:53:57 -05:00
ea2e98e6ae Add Hexxit server, removing the tmod one 2021-11-06 14:24:57 -05:00
cec0a5c3f8 Add Ardour to desktops 2021-10-25 19:37:40 -05:00
7bbc291cf8 Edit hostnames on workstations to reflect their actual ones 2021-10-25 19:10:25 -05:00
003b13fa84 Update Gulagbot to latest 2021-10-20 12:29:53 -05:00
7e7030c613 Fix syntax on cronjob (hopefully) 2021-10-17 11:08:31 -05:00
7b624d431a Change backup cronjob up a bit for Terraria 2021-10-17 10:53:51 -05:00
07647e5ee6 Add check to devices to ensure they can ping themselves over DNS 2021-10-15 19:17:48 -05:00
fd55782766 Overhaul DNS names for machines 2021-10-15 19:03:55 -05:00
ba228984c1 Add local backups for Terraria Fargo 2021-10-14 22:57:24 -05:00
ed1c59662c Deploy a new box with Fargo 2021-10-13 12:24:27 -05:00
e5441bcc2e Update to Nextcloud 22 2021-10-12 15:59:30 -05:00
b15fdd96f5 Install imagemagick on Nextcloud to make a big warning triangle go away 2021-10-12 15:29:06 -05:00
5bc39e7f48 Fix being unable to access said share 2021-10-07 10:40:31 -05:00
cf60d672b7 Add Samba to pi-media-1
[that was easy]
2021-10-07 10:39:20 -05:00
4f07856028 Modularize contact definitions 2021-10-06 15:30:12 -05:00
37c55b9cb2 Change templating behavior of certain sections of the Nagios config 2021-10-06 15:13:55 -05:00
309bfd8694 Stop notifying on warnings for package updates 2021-10-06 15:12:41 -05:00
e85104c9fd Add DNS resolution check 2021-10-05 10:28:06 -05:00
bc1b927298 Use check_packages to check for package updates on Debian systems 2021-10-05 10:16:22 -05:00
c9808bb171 Revert "Add stale library check"
This reverts commit 0beef5617b.
2021-10-03 23:54:32 -05:00
0beef5617b Add stale library check 2021-10-03 22:39:43 -05:00
1e1946d8e0 Add memory checks to hosts 2021-10-03 22:26:37 -05:00
da3f0a24f4 Add CPU Utilization check, nerf CPU Load check 2021-10-03 15:50:25 -05:00
bfab992eb8 Add check for unapplied package updates 2021-10-03 15:47:28 -05:00
3e20928e14 Add health endpoint to exposed endpoints on matrix.desu.ltd 2021-10-03 11:51:34 -05:00
7669234df9 Allow the addition of custom checks based on config_context, add roles to hostgroups 2021-10-03 11:48:53 -05:00
18655b7d62 Bump thresholds for PSQL connection check 2021-10-03 11:04:09 -05:00
42e2a3bd22 Fix client URL for Matrix being completely wrong. I guess. 2021-10-02 22:57:10 -05:00
c12d37cad2 Work on putting Element in place 2021-10-02 22:50:25 -05:00
7337fb49ed Narrow down the pass locations for Matrix to just server endpoints 2021-10-02 22:11:10 -05:00
e05d4a379b Add basic Synapse server configuration 2021-10-02 22:03:22 -05:00
aceba8407b Add DB configuration for Synapse 2021-10-02 22:03:05 -05:00
d06fc65af9 Correct errors in nginx configuration 2021-10-02 21:53:44 -05:00
e6b2c8b0a6 Configure web1.desu.ltd for Matrix delegation
Big things a comin
2021-10-02 21:46:32 -05:00
a7aa38a8e9 Add automatic reboots to main playbook 2021-10-01 09:10:02 -05:00
90da5ad3b1 Hardlock gulagbot to 2.4.0
I BROKE IT
AAA
2021-09-29 20:16:45 -05:00
2baffca0f5 Add configuration for Home Assistant 2021-09-27 17:18:30 -05:00
27bb55bf22 Convert pi-media-1 to ingress role 2021-09-27 15:12:20 -05:00
9039a75d3c Add note to replace Nagios with naemon(?) 2021-09-26 10:40:35 -05:00
1c1c8e41ae Null-mount nsca on Nagios image
God DAMN the log spam from this thing I'm not using is fucking ridiculous
2021-09-26 10:27:33 -05:00
427014f2ae Sanitize tag hostgroups in nagios with the tag- prefix
Stumbled across an issue where I can't have a Netbox tag that's just 'ansible'
2021-09-26 10:23:46 -05:00
0c8aa0a90f Add test DB for gulagbot on Linode 2021-09-26 08:03:37 -05:00
7779db30ad Use ansible_managed where possible 2021-09-24 20:48:41 -05:00
87f9c6ceb3 Rename swap checks to be agnostic of underlying tech 2021-09-24 13:25:21 -05:00
fb006b0cd3 Add playbook and Netbox tag to run the ansible role on a host 2021-09-24 13:03:21 -05:00
8ecc7bfbb6 Modularize Netbox into several containers with workers n stuff 2021-09-23 22:09:38 -05:00
8d59a1b201 Rework mounts for netbox container 2021-09-23 21:39:10 -05:00
81988a50fd Remove defunct deb link for raspberry pi imager 2021-09-23 21:32:36 -05:00
fdeb143a56 Apply mitigation for netbox-community/netbox-docker#586, update Netbox 2021-09-21 14:49:34 -05:00
f7b5c475d5 Add device_roles_bastion play 2021-09-19 21:49:15 -05:00
fe5eb5c14d Convert role invocations to use the full namespace of the role 2021-09-18 16:10:54 -05:00
68eb7e5422 Pin Netbox to 3.0.1 since apparently the container's broken 2021-09-18 08:52:05 -05:00
6382a81f47 Remove some extraneous backup locations on web1 2021-09-18 07:27:59 -05:00
31a2371fa1 Simplify task includes 2021-09-18 07:23:03 -05:00
9b79068380 Allow for the definition of a singular proxy_pass on ingress_servers to simplify configuration 2021-09-18 07:19:26 -05:00
60bfe91947 Add role for ingress controller, move configuration into it and its data structures 2021-09-18 00:04:05 -05:00
37150bf7d1 Remove polkit.service check
Apparently it's completely normal behavior for this service to be not running on a fresh boot
2021-09-14 19:40:53 -05:00
0f1fbf4fea Allow 30 second timeouts on check_by_ssh 2021-09-14 17:26:47 -05:00
ac702380b1 Add git to the tags for monitoring-scripts 2021-09-14 17:22:50 -05:00
b4f564cade Fix mountpoints and NFS exports for media 2021-09-13 13:59:27 -05:00
3f3c7b8392 Decom the K8s cluster, roll all its jobs into one singular machine 2021-09-13 13:50:22 -05:00
e49ebc583f Upgrade Netbox to 3.0 2021-09-12 15:07:31 -05:00
e405d7bf79 Add some directives to make Nextcloud stop throwing 413s 2021-09-11 10:36:22 -05:00
3f8ecbd8f5 Fix my borked pgsql connection pooling check 2021-09-07 17:08:18 -05:00
4bf02aedd3 Add even more checks for zerotier and psql 2021-09-07 16:11:11 -05:00
3cf9b94cea Add a quick service check for postgresql 2021-09-07 15:29:26 -05:00
b349015913 Add a ton more checks for things 2021-09-07 15:00:43 -05:00
92f26b7a0c Add check for atd 2021-09-07 14:55:00 -05:00
c362effe2a Remove NRPE 2021-09-07 14:33:45 -05:00
bad192e93e Refactor Nagios checks into check_by_ssh instead of NRPE
I was never particularly fond of having a random one-off daemon doing my RCE. Sure, it offers some protection, but limiting my exposure to the open internet is far more ideal.

I have tremendously more trust in the OpenSSH project than I do in Nagios. And for that reason, I'll be deprecating NRPE and shredding config files once these plays clean up
2021-09-07 14:27:23 -05:00
b38bb4bf62 Fix improper tagging on NRPE role 2021-09-07 13:41:21 -05:00
1ca062d6ea Modularize declaration of Nagios commands 2021-09-07 13:37:06 -05:00
2a7d343ef1 Move SSH check into YAML declaration of services 2021-09-07 13:29:19 -05:00
8e845b5f4e Modularize out all our service checks
I want them in DATA STRUCTURES God dammit. Get them out of the config file.
2021-09-06 19:43:54 -05:00
d3e51301bb Remove deprecated SNMP service checks 2021-09-06 19:23:54 -05:00
fc2b3cb7b3 Rename Nagios config to more appropriately reflect its role 2021-09-06 19:13:15 -05:00
360238fdd4 Ensure we're on a version of Netbox with secrets support
*sigh*
Guess I gotta set up a vault or something now.
2021-09-01 19:25:31 -05:00
c299e505cf Add Nextcloud auto app update cronjob 2021-08-29 23:55:56 -05:00
4bea6c2168 Add _netdev to args for pi-storage-1 mount 2021-08-29 16:43:55 -05:00
a6a8cd8590 Figure out how custom_apps works with Nextcloud 2021-08-28 11:01:44 -05:00
579b2fa296 Move "all" configuration into its own playbook 2021-08-26 02:39:17 -05:00
62b6a93b65 Discard cron output again 2021-08-24 21:22:11 -05:00
20e73e6fcf I'm fucking stupid? Don't put the TTY flag on things that aren't TTYs 2021-08-24 21:21:53 -05:00
89e86efafc Log output of Nextcloud cron to file for debugging 2021-08-24 21:16:44 -05:00
45098866e3 Add some stuff for MOVIE NIGHT WIT DA BOIS 2021-08-24 16:58:45 -05:00
2cef4b1992 Fix incorrect mountpoint for srv 2021-08-24 13:28:53 -05:00
6a938ea6b3 Add Nagios user to pi-storage-1 2021-08-24 12:40:32 -05:00
3b133782c9 Have pi-storage-1 psql listen on localhost 2021-08-24 12:29:44 -05:00
f6004def4a Add system-wide cronjob for Nextcloud cron
Guess that's not containerized, huh
2021-08-24 12:25:29 -05:00
145dcfe3fb Add Redis for Nextcloud, plus some config tuning 2021-08-24 01:12:12 -05:00
fc6739907e Remove unnecessary tasks and var files 2021-08-24 00:44:17 -05:00
e49b8e26a0 Fix srv.9iron.club using a mountpoint that didn't make sense 2021-08-24 00:39:42 -05:00
54eeb4a643 God damn can I stop forgetting random small shit please 2021-08-24 00:35:41 -05:00
d8bf31b144 Add rewrite for www.9iron.club to nginx config 2021-08-24 00:32:33 -05:00
1fb222fb15 Move web1 over to a containerized setup, containerize Nextcloud 2021-08-24 00:31:11 -05:00
a6cc1ecece Move ansible_pull vars to the relevant playbook 2021-08-23 23:25:02 -05:00
38b52a5e4a Make said playbook executable 2021-08-23 23:22:17 -05:00
5486f26c76 Move S76 configuration to its own playbook with a couple of tasks 2021-08-23 23:21:58 -05:00
02dd6cd553 Reorganize ALL of the playbooks 2021-08-23 20:28:18 -05:00
a2a5f6eedc Begin a refactor of playbook naming and organization 2021-08-23 20:20:59 -05:00
7f8a06180d Rename desktops from tags_desktop to device_roles_workstation
I already have the role so I may as well
2021-08-23 20:05:27 -05:00
535509db0a Fix open quotes on NRPE config 2021-08-23 18:23:30 -05:00
26c776a7db Add check_pgsql monitors 2021-08-23 18:18:53 -05:00
406adc20b9 Tune psql1 a bit to avoid more Bleromer outages 2021-08-22 10:45:23 -05:00
7d3e8b5a86 Specify backup dirs per-desktop 2021-08-20 15:07:27 -05:00
227f5828cd Oh right the disk check is here 2021-08-18 23:45:21 -05:00
888353910d Add checks for reboot-required 2021-08-18 23:01:26 -05:00
c031124246 Tighten the thresholds for disk warnings a bit more 2021-08-18 22:37:22 -05:00
ea8e205b42 Rename a bunch of checks to be shorter 2021-08-18 22:28:41 -05:00
5efa094fdc Back up Pi k8s nodes in some rudimentary way 2021-08-18 19:15:52 -05:00
94edbeafd9 Add checks for some common Systemd units 2021-08-18 19:05:16 -05:00
051fee73d3 Clone a new monitoring-scripts repo to hosts with NRPE installed 2021-08-18 18:16:43 -05:00
812b6dff77 Destroy old MC servers 2021-08-16 00:02:37 -05:00
f8951d61a4 Tag swap monitoring separately from other NRPE checks 2021-08-15 15:36:25 -05:00
beb8cad9ed Fix swap being way too lenient when there's no swap space at all 2021-08-15 15:28:34 -05:00
668ef3a75f Fix up some checks, add the swap check 2021-08-15 15:21:46 -05:00
9f4727b6c9 Tweak checks to make them more better 2021-08-15 15:18:52 -05:00
4d0b005181 Add NRPE role, provision checks for it 2021-08-15 14:24:35 -05:00
2918a3348b Polish up our SNMP checks and playbooks 2021-08-15 13:09:04 -05:00
c745de9309 Reorder args on TCP checks to better match the natural sorting order of the iterations 2021-08-15 02:44:42 -05:00
94f6d45d07 Fix HTTP checks that redirect to TLS connections failing 2021-08-15 02:43:59 -05:00
ce77c104a6 Fix typo in docker-prune playbook 2021-08-15 02:30:20 -05:00
9ab0f62442 Genericize manually-defined checks into tagged Netbox services 2021-08-15 02:29:56 -05:00
73abab9607 Add docker-prune playbook 2021-08-15 00:59:08 -05:00
be7fa959ea Switch to a regex match for that SNMP check 2021-08-08 15:34:11 -05:00
da432c0dcc Make our Nagios SNMP user, apply some changes to its container, and spin up some barebones checks 2021-08-08 14:46:58 -05:00
a254910cdc Testing some SNMP stuff 2021-08-08 13:36:52 -05:00
caadf375f2 Add basic site checks 2021-08-08 12:35:35 -05:00
be7d1a24d6 Auto-restart nagios when its config changes 2021-08-08 12:15:10 -05:00
4c2bfb996c Actually add that Nagios template 2021-08-08 02:24:59 -05:00