Commit Graph

1929 Commits

Author SHA1 Message Date
Aman Agrawal
ae4d65ad7f hooks: Make deploy notifications best-effort.
The pre-deploy zulip_notify hook posts an informational "Starting
deploy" message; combined with `set -e`, a failure to deliver it
(e.g. the bot lacks posting permission on the target channel, or
the server is unreachable) propagated out of the hook, and
run_hooks.py treats any pre-deploy hook failure as fatal. A
transient chat-notification problem could thus block an urgent
deploy.

Swallow delivery failures in the shared zulip_send helper after
logging them, so neither the pre- nor post-deploy notification can
abort a deploy. The post-deploy hooks were already best-effort;
this brings the gating pre-deploy notification in line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 18:22:16 +08:00
Aman Agrawal
3ae2232e2e kandra: Allow frontend access to camo from port 9292.
Partial revert from d8f47d7cc2.

Port 9292 is used both to serve metrics to prometheus and for serving
frontend. Only allowing prometheus access to port 9292 broke image
previews for external image uploads since frontend was not able
to access camo via port 9292.
2026-05-22 11:46:19 +05:30
Alex Vandiver
9b4f2baba5 kandra: Remove firewall rule for incoming port 3000.
Teleport smuggles this in via localhost -- and grafana
is already only listening on localhost.
2026-05-21 01:01:03 -04:00
Alex Vandiver
32023d7cf7 kandra: Close down services on prometheus host.
These were missed in e317fb5582.
2026-05-21 01:01:03 -04:00
Alex Vandiver
32ea23e561 kandra: Gracefully reload Teleport when its config changes.
A change to /etc/teleport_*.yaml today triggers a hard stop+start of
the teleport_$part unit, severing every active SSH, database, and
app-proxy session holding a connection through that node.  If your
zulip-puppet-apply is being controlled via Teleport SSH, this can
easily lead to bad state.

Teleport's signal handling supports a fork+drain reload on SIGHUP[^1],
which spawns a new daemon to serve new connections, and only shuts the
old process down after existing connections close.

Route the YAML config notifies through a `systemctl reload` exec so we
actually use the `ExecReload` we already defined, which which
exercises that codepath.  Unit-file and package changes still notify
the service directly to get a real restart.

[^1]: https://goteleport.com/docs/reference/deployment/signals/
2026-05-20 16:52:18 -04:00
Alex Vandiver
e317fb5582 kandra: Make exporter ports listen only on localhost.
Scrapes now arrive via the Teleport tunnel which terminates at
localhost on each host, so the metrics ports no longer need to
listen on any other interface.
2026-05-19 10:26:52 -04:00
Alex Vandiver
3aade3588e kandra: Close down access to exporter ports.
These are now accessed through Teleport apps.
2026-05-19 10:26:52 -04:00
Alex Vandiver
15daadd56b kandra: Update prometheus config to use teleport-sd. 2026-05-19 10:26:52 -04:00
Alex Vandiver
0dd0143060 kandra: Use tbot application proxy + teleport-sd for service discovery.
This makes the Prometheus configuration no longer have to know
anything about which hosts run which exporters; instead, hosts
register the exporter in Teleport, and Prometheus asks Teleport which
instances of a given exporter it knows about.
2026-05-19 10:26:52 -04:00
Alex Vandiver
4320986c74 kandra: Add a Teleport app for each Prometheus exporter. 2026-05-19 10:26:52 -04:00
Alex Vandiver
aa6c135cd4 kandra: Switch 127.0.0.1 to localhost, so it works on ipv6 hosts.
For example, the katex server binds to [::]:9700.
2026-05-19 10:26:52 -04:00
Alex Vandiver
1635fa5801 kandra: Use to_yaml when writing YAML data. 2026-05-19 10:26:52 -04:00
Alex Vandiver
e2d426292c kandra: Give the teleport server its own data_dir.
The ssh node and the auth node should not share a data_dir.
2026-05-19 10:26:52 -04:00
Alex Vandiver
739162d8bc kandra: Remove explicit return_per_object_metrics setting in rabbitmq.
We already accomplish this by using the dedicated metrics endpoint that
returns that variant[^1].

[^1]: https://www.rabbitmq.com/docs/prometheus#per-object-endpoint
2026-05-14 00:10:14 -04:00
Alex Vandiver
f7808492c9 kandra: Switch Teleport to also bind ipv6 addresses. 2026-05-12 01:53:09 -04:00
Alex Vandiver
0c8c5ec7d1 kandra: Note that port 3025 is auth <-> proxy, which is the same host. 2026-05-12 01:53:09 -04:00
Alex Vandiver
29b60c0d77 kandra: Switch Teleport to multiplexed port 443. 2026-05-12 01:53:09 -04:00
Anders Kaseorg
d4d503f39b requirements: Remove dateutil.
Some checks failed
API Documentation Update Check / check-feature-level-updated (push) Has been cancelled
Code scanning / CodeQL (push) Has been cancelled
Zulip production suite / Ubuntu 22.04 production build (push) Has been cancelled
Zulip CI / ${{ matrix.name }} (zulip/ci:bookworm, true, false, Debian 12 (Python 3.11, backend + documentation), bookworm) (push) Has been cancelled
Zulip CI / ${{ matrix.name }} (zulip/ci:jammy, false, true, Ubuntu 22.04 (Python 3.10, backend + frontend), jammy) (push) Has been cancelled
Zulip CI / ${{ matrix.name }} (zulip/ci:noble, false, false, Ubuntu 24.04 (Python 3.12, backend), noble) (push) Has been cancelled
Zulip CI / ${{ matrix.name }} (zulip/ci:resolute, false, false, Ubuntu 26.04 (Python 3.14, backend), resolute) (push) Has been cancelled
Zulip CI / ${{ matrix.name }} (zulip/ci:trixie, false, false, Debian 13 (Python 3.13, backend), trixie) (push) Has been cancelled
API Documentation Update Check / notify-if-api-docs-changed (push) Has been cancelled
Zulip production suite / ${{ matrix.name }} (zulip/ci:bookworm, --test-custom-db, Debian 12 production install with custom db name and user, bookworm) (push) Has been cancelled
Zulip production suite / ${{ matrix.name }} (zulip/ci:jammy, , Ubuntu 22.04 production install and PostgreSQL upgrade with pgroonga, jammy) (push) Has been cancelled
Zulip production suite / ${{ matrix.name }} (zulip/ci:noble, , Ubuntu 24.04 production install, noble) (push) Has been cancelled
Zulip production suite / ${{ matrix.name }} (zulip/ci:resolute, , Ubuntu 26.04 production install, resolute) (push) Has been cancelled
Zulip production suite / ${{ matrix.name }} (zulip/ci:trixie, , Debian 13 production install, trixie) (push) Has been cancelled
Zulip production suite / ${{ matrix.name }} (zulip/ci:bookworm-7.0, 7.0 Version Upgrade, bookworm) (push) Has been cancelled
Zulip production suite / ${{ matrix.name }} (zulip/ci:bookworm-8.0, 8.0 Version Upgrade, bookworm) (push) Has been cancelled
Zulip production suite / ${{ matrix.name }} (zulip/ci:jammy-6.0, 6.0 Version Upgrade, jammy) (push) Has been cancelled
Zulip production suite / ${{ matrix.name }} (zulip/ci:noble-10.0, 10.0 Version Upgrade, noble) (push) Has been cancelled
Zulip production suite / ${{ matrix.name }} (zulip/ci:noble-9.0, 9.0 Version Upgrade, noble) (push) Has been cancelled
Zulip production suite / ${{ matrix.name }} (zulip/ci:trixie-11.0, 11.0 Version Upgrade, trixie) (push) Has been cancelled
Zulip production suite / Required jobs (push) Has been cancelled
Zulip CI / Required jobs (push) Has been cancelled
This removes

- an unclear fuzzy syntax that had been incorrectly accepted by our
`<time:…>` Markdown extension and could not be reproducibly parsed
without a specific Python library (even the UNIX timestamp part did
not work reliably: some UNIX timestamps were instead parsed as
YYYYMMDD);

- a fundamentally ambiguous ad-hoc list of three-letter timezone
abbreviations that we had needed to manually disambiguate by some kind
of subjective popularity;

- an unpleasant dependency of the `pg_backup_and_purge` script that we
had needed to install system-wide because there might not be a
virtualenv set up.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2026-05-10 00:21:37 -07:00
Alex Vandiver
f7bc4e97df puppet: Create /home/zulip/uploads at install time.
Previously, the local uploads directory was created lazily by the
upload code as files were written.  This worked but left the
directory absent immediately after install, which is awkward to
reason about and breaks the upcoming check_uploads_settings system
check that verifies the configured upload directory exists.

The path is documented as something administrators may replace with
a symlink to a different storage location, so we use an exec with a
'test -d' guard (which follows symlinks) rather than a file resource
that would replace the symlink with a fresh directory.
2026-05-09 23:18:47 -07:00
Alex Vandiver
97a8a5f1a0 nginx: Make Tornado /events locations exact matches.
`location /api/v1/events` is a *prefix-match*, and as such passes any
URI starting with `/api/v1/events` through to Tornado -- including
encoded oddities like `/api/v1/events%3fdont%255fblock=false` (whose
decoded $uri still has the prefix) and `/api/v1/events/internal`,
which is meant to be reachable only via the loopback interface but was
being proxied to Tornado from the public socket.  Tornado's
internal_api_view rejects external callers both via its REMOTE_ADDR
check and its `SHARED_SECRET` check, so this was not exploitable, but
a Tornado worker still had to handle each such request just to 403 it.

Switch to exact matches, as was likely intended all along, which lets
those requests fall through to Django/uWSGI and 404 without ever
waking Tornado.  The legitimate internal callers in
zerver/tornado/django_api.py talk to http://127.0.0.1:<tornado-port>
directly, so they are unaffected, as is the X-Accel-Redirect path
served by the /internal/tornado/ regex location.
2026-04-27 09:45:18 -07:00
Alex Vandiver
ea2c37cf50 puppet: get_django_setting_slow returns nil if there are no deploy dirs. 2026-03-09 21:43:06 -07:00
iofq
05392a74bf puppet: Install logrotate package on Debian systems.
A default Debian/Ubuntu server image comes with the `logrotate`
package installed, but the `ubuntu` Docker image does not. This causes
the Zulip Docker install to not rotate its logfiles, despite having
logrotate configuration files installed.

Add `logrotate` to the list of required packages for Debian-based
systems in the puppet manifest, to ensure installation is enforced on
all target platforms, including the Docker image.

Fixes: zulip/docker-zulip#263
2026-02-26 15:52:53 -05:00
Alex Vandiver
cbcc588999 kandra: Fix grafana tarball directory prefix.
Apparently they now build this with grafana-1.2.3/ as a prefix, not
grafana-v1.2.3/
2026-02-25 23:46:44 -05:00
Alex Vandiver
5c1d6a8c98 puppet: Fix the wal-g package location and hashes. 2026-02-25 15:10:57 -08:00
Alex Vandiver
f2a5dc949a puppet: Update dependencies. 2026-02-24 08:59:08 -08:00
Alex Vandiver
0f67aa1ab2 puppet: Rename redis reconfiguration Exec to better name. 2026-02-14 21:04:12 -08:00
Alex Vandiver
874f5f8441 puppet: Remove zuli-redis.conf workaround from Zulip 2.0.0.
It is no longer possible to upgrade directly from Zulip 2.0.0, so no
current install needs this.
2026-02-14 21:04:12 -08:00
Alex Vandiver
3495258664 puppet: Remove /run/redis explicit creation.
The packages now handle this themselves, and use mode 2755, which we
should not fight them about.
2026-02-14 21:04:07 -08:00
Alex Vandiver
147e98c03b puppet: Provide zulip_version from the same tree as puppet is run.
This fixes a bug where the version puppet provided was the _current_
version, which meant that what it provided lagged by one deployment.
It also did not function on first install, as
`/home/zulip/deployments/current` does not exist yet on first puppet
run.

Examine the `version.py` of from the same tree that puppet is being
run from, which addresses both of these issues.
2026-02-14 21:01:56 -08:00
Alex Vandiver
aef2d28194 nginx: Hardcode a Host: header to uwsgi when making health checks.
Fixes: #37805.
2026-02-11 21:04:36 -08:00
Alex Vandiver
22ad7cd4ee nginx: Add preload to HSTS. 2026-02-09 08:54:05 -08:00
Alex Vandiver
c18a7eabbe nginx: Add includeSubdomains to HSTS. 2026-02-09 08:54:05 -08:00
Alex Vandiver
1ba90e5faa nginx: Increase HSTS to 1 year, from 6 months. 2026-02-09 08:54:05 -08:00
Alex Vandiver
687d2e0bd3 nginx: Add missing semicolon to CSP definition. 2026-02-09 08:51:58 -08:00
Anders Kaseorg
4913fe228d env-wal-g: Fix inappropriate 2>&1.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2026-02-06 21:36:20 -05:00
Anders Kaseorg
5a8c5cb563 puppet: Use Open3 for safer command execution.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2026-02-06 21:36:20 -05:00
Alex Vandiver
89473cbca2 nginx: Set Cross-Origin-Opener-Policy as defense in depth.
While we set rel="noopener", this provides additional protection
against tabnabbing.
2026-02-06 11:54:51 -08:00
Alex Vandiver
1a46153264 nginx: Set Referrer-Policy as defense in depth for Referer: headers.
While we set `rel=noreferrer` on our links, this provides additional
protections.
2026-02-06 11:54:51 -08:00
Alex Vandiver
e66cfe02db nginx: Refactor immutable cache headers.
This fixes a bug where, because nginx does not cascade `add_headers`,
Astro headers improperly did not include the default headers.
2026-02-05 17:10:14 -08:00
Alex Vandiver
df4d695d4f nginx: Merge two adjacent location blocks. 2026-02-05 17:10:14 -08:00
Alex Vandiver
10c13e6367 nginx: Always send X-Content-Type-Options, even on error pages. 2026-02-05 17:08:47 -08:00
Alex Vandiver
f0cc982e52 postgresql: Default to io_method=io_uring on PostgreSQL 18.
This is more performant than the PostgreSQL 18 default of
`io_method=workers`, but requires kernel 5.1.  All supported OSes of
Zulip have at least that -- however, it may not be available inside
containers, so add a puppet fact to check the syscall before enabling
it in PostgreSQL.
2026-02-02 16:26:08 -08:00
Lauryn Menard
6f4e88a441 demo-orgs: Update Welcome bot string to use global time.
Updates the demo organization warning for demo creators to use a
global timestamp for when the demo organization will be deleted
by the archive-messages cronjob (based on the realm's scheduled
deletion timetamp).
2026-01-28 09:14:07 -08:00
Arun-kushwaha007
1e7974909d letsencrypt: Enable strict shell mode for email server restart.
Add strict shell options to the email server restart script. This
script is a single command wrapper with no variables or pipelines,
making strict mode unambiguous and safe.

Fixes part of #20748.
2026-01-27 14:16:39 -05:00
Arun Kushwaha
19e9a4e44b
tooling: Enable strict shell mode in selected scripts.
Add strict shell options (set -euo pipefail / set -eu) to a small
set of simple shell scripts that do not rely on unset variables
or pipeline exit-code masking.

Each script was reviewed line by line to confirm strict mode is
safe and that stopping immediately on errors is the correct
behavior for these scripts.

Fixes part of #20748.
2026-01-21 09:53:52 -08:00
Alex Vandiver
6f29077560 puppet: Ensure latest ca-certificates is installed.
This is particularly necessary if `application_server.custom_ca_path`
is in use, as that causes the system CA bundle to be used for all
outgoing `requests` connections, instead of the standard `certifi`
package.
2026-01-21 09:33:55 -08:00
Alex Vandiver
ab743b19de puppet: Allow using the system CA bundle, with custom CA cert.
Fixes #18752.
2026-01-21 09:33:55 -08:00
Alex Vandiver
ec81f5f29b kandra: Chown to zulip:zulip after mounting.
Remove the user/group on the mountpoint, since it will be wrong if the
mount has already happened.  We cannot have a _second_ occurrence of
the directory to set ownership after the mount, so we use an exec to
manage it instead.
2026-01-16 09:40:54 -08:00
Alex Vandiver
6ad3dce8db logging: Split out registration logging. 2026-01-07 15:06:27 -08:00
Alex Vandiver
2a9b7f132b kandra: Create a tmp mountpoint for frontend hosts. 2026-01-07 11:13:12 -08:00