The pre-deploy zulip_notify hook posts an informational "Starting
deploy" message; combined with `set -e`, a failure to deliver it
(e.g. the bot lacks posting permission on the target channel, or
the server is unreachable) propagated out of the hook, and
run_hooks.py treats any pre-deploy hook failure as fatal. A
transient chat-notification problem could thus block an urgent
deploy.
Swallow delivery failures in the shared zulip_send helper after
logging them, so neither the pre- nor post-deploy notification can
abort a deploy. The post-deploy hooks were already best-effort;
this brings the gating pre-deploy notification in line.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Partial revert from d8f47d7cc2.
Port 9292 is used both to serve metrics to prometheus and for serving
frontend. Only allowing prometheus access to port 9292 broke image
previews for external image uploads since frontend was not able
to access camo via port 9292.
A change to /etc/teleport_*.yaml today triggers a hard stop+start of
the teleport_$part unit, severing every active SSH, database, and
app-proxy session holding a connection through that node. If your
zulip-puppet-apply is being controlled via Teleport SSH, this can
easily lead to bad state.
Teleport's signal handling supports a fork+drain reload on SIGHUP[^1],
which spawns a new daemon to serve new connections, and only shuts the
old process down after existing connections close.
Route the YAML config notifies through a `systemctl reload` exec so we
actually use the `ExecReload` we already defined, which which
exercises that codepath. Unit-file and package changes still notify
the service directly to get a real restart.
[^1]: https://goteleport.com/docs/reference/deployment/signals/
Scrapes now arrive via the Teleport tunnel which terminates at
localhost on each host, so the metrics ports no longer need to
listen on any other interface.
This makes the Prometheus configuration no longer have to know
anything about which hosts run which exporters; instead, hosts
register the exporter in Teleport, and Prometheus asks Teleport which
instances of a given exporter it knows about.
Zulip production suite / ${{ matrix.name }} (zulip/ci:bookworm, --test-custom-db, Debian 12 production install with custom db name and user, bookworm) (push) Has been cancelled
Zulip production suite / ${{ matrix.name }} (zulip/ci:jammy, , Ubuntu 22.04 production install and PostgreSQL upgrade with pgroonga, jammy) (push) Has been cancelled
This removes
- an unclear fuzzy syntax that had been incorrectly accepted by our
`<time:…>` Markdown extension and could not be reproducibly parsed
without a specific Python library (even the UNIX timestamp part did
not work reliably: some UNIX timestamps were instead parsed as
YYYYMMDD);
- a fundamentally ambiguous ad-hoc list of three-letter timezone
abbreviations that we had needed to manually disambiguate by some kind
of subjective popularity;
- an unpleasant dependency of the `pg_backup_and_purge` script that we
had needed to install system-wide because there might not be a
virtualenv set up.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Previously, the local uploads directory was created lazily by the
upload code as files were written. This worked but left the
directory absent immediately after install, which is awkward to
reason about and breaks the upcoming check_uploads_settings system
check that verifies the configured upload directory exists.
The path is documented as something administrators may replace with
a symlink to a different storage location, so we use an exec with a
'test -d' guard (which follows symlinks) rather than a file resource
that would replace the symlink with a fresh directory.
`location /api/v1/events` is a *prefix-match*, and as such passes any
URI starting with `/api/v1/events` through to Tornado -- including
encoded oddities like `/api/v1/events%3fdont%255fblock=false` (whose
decoded $uri still has the prefix) and `/api/v1/events/internal`,
which is meant to be reachable only via the loopback interface but was
being proxied to Tornado from the public socket. Tornado's
internal_api_view rejects external callers both via its REMOTE_ADDR
check and its `SHARED_SECRET` check, so this was not exploitable, but
a Tornado worker still had to handle each such request just to 403 it.
Switch to exact matches, as was likely intended all along, which lets
those requests fall through to Django/uWSGI and 404 without ever
waking Tornado. The legitimate internal callers in
zerver/tornado/django_api.py talk to http://127.0.0.1:<tornado-port>
directly, so they are unaffected, as is the X-Accel-Redirect path
served by the /internal/tornado/ regex location.
A default Debian/Ubuntu server image comes with the `logrotate`
package installed, but the `ubuntu` Docker image does not. This causes
the Zulip Docker install to not rotate its logfiles, despite having
logrotate configuration files installed.
Add `logrotate` to the list of required packages for Debian-based
systems in the puppet manifest, to ensure installation is enforced on
all target platforms, including the Docker image.
Fixes: zulip/docker-zulip#263
This fixes a bug where the version puppet provided was the _current_
version, which meant that what it provided lagged by one deployment.
It also did not function on first install, as
`/home/zulip/deployments/current` does not exist yet on first puppet
run.
Examine the `version.py` of from the same tree that puppet is being
run from, which addresses both of these issues.
This is more performant than the PostgreSQL 18 default of
`io_method=workers`, but requires kernel 5.1. All supported OSes of
Zulip have at least that -- however, it may not be available inside
containers, so add a puppet fact to check the syscall before enabling
it in PostgreSQL.
Updates the demo organization warning for demo creators to use a
global timestamp for when the demo organization will be deleted
by the archive-messages cronjob (based on the realm's scheduled
deletion timetamp).
Add strict shell options to the email server restart script. This
script is a single command wrapper with no variables or pipelines,
making strict mode unambiguous and safe.
Fixes part of #20748.
Add strict shell options (set -euo pipefail / set -eu) to a small
set of simple shell scripts that do not rely on unset variables
or pipeline exit-code masking.
Each script was reviewed line by line to confirm strict mode is
safe and that stopping immediately on errors is the correct
behavior for these scripts.
Fixes part of #20748.
This is particularly necessary if `application_server.custom_ca_path`
is in use, as that causes the system CA bundle to be used for all
outgoing `requests` connections, instead of the standard `certifi`
package.
Remove the user/group on the mountpoint, since it will be wrong if the
mount has already happened. We cannot have a _second_ occurrence of
the directory to set ownership after the mount, so we use an exec to
manage it instead.