Commit Graph

2181 Commits

Author SHA1 Message Date
406adc20b9 Tune psql1 a bit to avoid more Bleromer outages 2021-08-22 10:45:23 -05:00
7d3e8b5a86 Specify backup dirs per-desktop 2021-08-20 15:07:27 -05:00
227f5828cd Oh right the disk check is here 2021-08-18 23:45:21 -05:00
a51f40d1e6 Fix reboot-home playbook too 2021-08-18 23:26:51 -05:00
888353910d Add checks for reboot-required 2021-08-18 23:01:26 -05:00
84fa0af8d2 Fix reboot script for prod 2021-08-18 22:48:41 -05:00
c031124246 Tighten the thresholds for disk warnings a bit more 2021-08-18 22:37:22 -05:00
ea8e205b42 Rename a bunch of checks to be shorter 2021-08-18 22:28:41 -05:00
5efa094fdc Back up Pi k8s nodes in some rudimentary way 2021-08-18 19:15:52 -05:00
94edbeafd9 Add checks for some common Systemd units 2021-08-18 19:05:16 -05:00
051fee73d3 Clone a new monitoring-scripts repo to hosts with NRPE installed 2021-08-18 18:16:43 -05:00
2421fab739 Template out extra NRPE commands, allowing full file paths 2021-08-18 18:12:30 -05:00
f9ec6f0758 Remove some files that I won't be using from the NRPE role 2021-08-18 18:09:13 -05:00
a0f9a7dd4b Deregister the NRPE role
We forkin
2021-08-18 18:07:23 -05:00
812b6dff77 Destroy old MC servers 2021-08-16 00:02:37 -05:00
f8951d61a4 Tag swap monitoring separately from other NRPE checks 2021-08-15 15:36:25 -05:00
beb8cad9ed Fix swap being way too lenient when there's no swap space at all 2021-08-15 15:28:34 -05:00
668ef3a75f Fix up some checks, add the swap check 2021-08-15 15:21:46 -05:00
9f4727b6c9 Tweak checks to make them more better 2021-08-15 15:18:52 -05:00
4d0b005181 Add NRPE role, provision checks for it 2021-08-15 14:24:35 -05:00
2918a3348b Polish up our SNMP checks and playbooks 2021-08-15 13:09:04 -05:00
c745de9309 Reorder args on TCP checks to better match the natural sorting order of the iterations 2021-08-15 02:44:42 -05:00
94f6d45d07 Fix HTTP checks that redirect to TLS connections failing 2021-08-15 02:43:59 -05:00
ce77c104a6 Fix typo in docker-prune playbook 2021-08-15 02:30:20 -05:00
9ab0f62442 Genericize manually-defined checks into tagged Netbox services 2021-08-15 02:29:56 -05:00
73abab9607 Add docker-prune playbook 2021-08-15 00:59:08 -05:00
be7fa959ea Switch to a regex match for that SNMP check 2021-08-08 15:34:11 -05:00
da432c0dcc Make our Nagios SNMP user, apply some changes to its container, and spin up some barebones checks 2021-08-08 14:46:58 -05:00
a254910cdc Testing some SNMP stuff 2021-08-08 13:36:52 -05:00
39d2f932cf Add snmpd role 2021-08-08 13:26:39 -05:00
caadf375f2 Add basic site checks 2021-08-08 12:35:35 -05:00
be7d1a24d6 Auto-restart nagios when its config changes 2021-08-08 12:15:10 -05:00
e3c5c00272 Fix not including the zerotier playbook 2021-08-08 02:28:22 -05:00
4c2bfb996c Actually add that Nagios template 2021-08-08 02:24:59 -05:00
e968d4a7cf Fix up that jank config and make it actually totally usable 2021-08-08 01:20:48 -05:00
2f06fe61e0 Add pynetbox to important things 2021-08-08 00:30:41 -05:00
5d5cab59eb Add Nagios and some dysfunctional templating code 2021-08-08 00:28:25 -05:00
691a934297 Genericize the inclusion of libraspberrypi-bin 2021-08-07 17:23:15 -05:00
d68e3430a8 Modularize zerotier as well 2021-08-07 17:14:28 -05:00
759df2f593 Allow for dynamic tagging of ansible-pull hosts 2021-08-07 17:09:20 -05:00
07ea9806da Docker goes on everything 2021-08-07 16:55:28 -05:00
38f70d0fca Unify motd definition 2021-08-07 16:52:19 -05:00
30dd4ff8dc Divide webservices into task files 2021-08-07 16:49:24 -05:00
69f3edcf2b Clean out deprecated k8s garbage 2021-08-07 16:31:36 -05:00
55304ac4d9 Rename pistorage to tags_pistorage 2021-08-07 12:16:07 -05:00
340da1926e Move gameservers into task files 2021-08-07 12:08:29 -05:00
d6328146b3 Add nfs-common to common role 2021-08-07 11:52:03 -05:00
caabd61057 Revert "Break out testing into its own triple-parallelized flow"
This reverts commit 9e5e2a23d4.
2021-08-05 11:50:08 -05:00
9e5e2a23d4 Break out testing into its own triple-parallelized flow 2021-08-05 01:27:41 -05:00
0c1fab838f Run test plays on scheduled jobs
It makes sense to skip the test on a pipeline since it just
signifies an application update or a re-run and probably wants
to complete quickly. It does not make sense to get rid of our
safeguards on a job that runs at 1AM every night.
2021-08-05 01:10:37 -05:00