406adc20b9
Tune psql1 a bit to avoid more Bleromer outages
2021-08-22 10:45:23 -05:00
7d3e8b5a86
Specify backup dirs per-desktop
2021-08-20 15:07:27 -05:00
227f5828cd
Oh right the disk check is here
2021-08-18 23:45:21 -05:00
a51f40d1e6
Fix reboot-home playbook too
2021-08-18 23:26:51 -05:00
888353910d
Add checks for reboot-required
2021-08-18 23:01:26 -05:00
84fa0af8d2
Fix reboot script for prod
2021-08-18 22:48:41 -05:00
c031124246
Tighten the thresholds for disk warnings a bit more
2021-08-18 22:37:22 -05:00
ea8e205b42
Rename a bunch of checks to be shorter
2021-08-18 22:28:41 -05:00
5efa094fdc
Back up Pi k8s nodes in some rudimentary way
2021-08-18 19:15:52 -05:00
94edbeafd9
Add checks for some common Systemd units
2021-08-18 19:05:16 -05:00
051fee73d3
Clone a new monitoring-scripts repo to hosts with NRPE installed
2021-08-18 18:16:43 -05:00
2421fab739
Template out extra NRPE commands, allowing full file paths
2021-08-18 18:12:30 -05:00
f9ec6f0758
Remove some files that I won't be using from the NRPE role
2021-08-18 18:09:13 -05:00
a0f9a7dd4b
Deregister the NRPE role
...
We forkin
2021-08-18 18:07:23 -05:00
812b6dff77
Destroy old MC servers
2021-08-16 00:02:37 -05:00
f8951d61a4
Tag swap monitoring separately from other NRPE checks
2021-08-15 15:36:25 -05:00
beb8cad9ed
Fix swap being way too lenient when there's no swap space at all
2021-08-15 15:28:34 -05:00
668ef3a75f
Fix up some checks, add the swap check
2021-08-15 15:21:46 -05:00
9f4727b6c9
Tweak checks to make them more better
2021-08-15 15:18:52 -05:00
4d0b005181
Add NRPE role, provision checks for it
2021-08-15 14:24:35 -05:00
2918a3348b
Polish up our SNMP checks and playbooks
2021-08-15 13:09:04 -05:00
c745de9309
Reorder args on TCP checks to better match the natural sorting order of the iterations
2021-08-15 02:44:42 -05:00
94f6d45d07
Fix HTTP checks that redirect to TLS connections failing
2021-08-15 02:43:59 -05:00
ce77c104a6
Fix typo in docker-prune playbook
2021-08-15 02:30:20 -05:00
9ab0f62442
Genericize manually-defined checks into tagged Netbox services
2021-08-15 02:29:56 -05:00
73abab9607
Add docker-prune playbook
2021-08-15 00:59:08 -05:00
be7fa959ea
Switch to a regex match for that SNMP check
2021-08-08 15:34:11 -05:00
da432c0dcc
Make our Nagios SNMP user, apply some changes to its container, and spin up some barebones checks
2021-08-08 14:46:58 -05:00
a254910cdc
Testing some SNMP stuff
2021-08-08 13:36:52 -05:00
39d2f932cf
Add snmpd role
2021-08-08 13:26:39 -05:00
caadf375f2
Add basic site checks
2021-08-08 12:35:35 -05:00
be7d1a24d6
Auto-restart nagios when its config changes
2021-08-08 12:15:10 -05:00
e3c5c00272
Fix not including the zerotier playbook
2021-08-08 02:28:22 -05:00
4c2bfb996c
Actually add that Nagios template
2021-08-08 02:24:59 -05:00
e968d4a7cf
Fix up that jank config and make it actually totally usable
2021-08-08 01:20:48 -05:00
2f06fe61e0
Add pynetbox to important things
2021-08-08 00:30:41 -05:00
5d5cab59eb
Add Nagios and some dysfunctional templating code
2021-08-08 00:28:25 -05:00
691a934297
Genericize the inclusion of libraspberrypi-bin
2021-08-07 17:23:15 -05:00
d68e3430a8
Modularize zerotier as well
2021-08-07 17:14:28 -05:00
759df2f593
Allow for dynamic tagging of ansible-pull hosts
2021-08-07 17:09:20 -05:00
07ea9806da
Docker goes on everything
2021-08-07 16:55:28 -05:00
38f70d0fca
Unify motd definition
2021-08-07 16:52:19 -05:00
30dd4ff8dc
Divide webservices into task files
2021-08-07 16:49:24 -05:00
69f3edcf2b
Clean out deprecated k8s garbage
2021-08-07 16:31:36 -05:00
55304ac4d9
Rename pistorage to tags_pistorage
2021-08-07 12:16:07 -05:00
340da1926e
Move gameservers into task files
2021-08-07 12:08:29 -05:00
d6328146b3
Add nfs-common to common role
2021-08-07 11:52:03 -05:00
caabd61057
Revert "Break out testing into its own triple-parallelized flow"
...
This reverts commit 9e5e2a23d4
.
2021-08-05 11:50:08 -05:00
9e5e2a23d4
Break out testing into its own triple-parallelized flow
2021-08-05 01:27:41 -05:00
0c1fab838f
Run test plays on scheduled jobs
...
It makes sense to skip the test on a pipeline since it just
signifies an application update or a re-run and probably wants
to complete quickly. It does not make sense to get rid of our
safeguards on a job that runs at 1AM every night.
2021-08-05 01:10:37 -05:00