PostgreSQL's `default_statistics_target` is used to track how many
"most common values" ("MCVs") for a column when performing an
`ANALYZE`. For `tsvector` columns, the number of values is actually
10x this number, because each row contains multiple values for the
column[1]. The `default_statistics_target` defaults to 100[2], and
Zulip does not adjust this at the server level.
This translates to 1000 entries in the MCV for tsvectors. For
large tables like `zerver_messages`, a too-small value can cause
mis-planned query plans. The query planner assumes that any
entry *not* found in the MCV list is *half* as likely as the
least-likely value in it. If the table is large, and the MCV list is
too short (as 1000 values is for large deployments), arbitrary
no-in-the-MCV words will often be estimated by the query planner to
occur comparatively quite frequently in the index. Based on this, the
planner will instead choose to scan all messages accessible by the
user, filtering by word in tsvector, instead of using the tsvector
index and filtering by being accessible to the user. This results in
degraded performance for word searching.
However, PostgreSQL allows adjustment of this value on a per-column
basis. Add a migration to adjust the value up to 10k for
`search_tsvector` on `zerver_message`, which results in 100k entries
in that MCV list.
PostgreSQL's documentation says[3]:
> Raising the limit might allow more accurate planner estimates to be
> made, particularly for columns with irregular data distributions, at
> the price of consuming more space in `pg_statistic` and slightly
> more time to compute the estimates.
These costs seem adequate for the utility of having better search.
In the event that the pgroonga backend is in use, these larger index
statistics are simply wasted space and `VACUUM` computational time,
but the costs are likely still reasonable -- even 100k values are
dwarfed by the size of the database needed to generate 100k unique
entries in tsvectors.
[1]: https://github.com/postgres/postgres/blob/REL_14_4/src/backend/utils/adt/array_typanalyze.c#L261-L267
[2]: https://www.postgresql.org/docs/14/runtime-config-query.html#GUC-DEFAULT-STATISTICS-TARGET
[3]: https://www.postgresql.org/docs/14/planner-stats.html#id-1.5.13.5.3
|
||
|---|---|---|
| .github | ||
| .tx | ||
| .vscode | ||
| analytics | ||
| confirmation | ||
| corporate | ||
| docs | ||
| frontend_tests | ||
| locale | ||
| pgroonga | ||
| puppet | ||
| requirements | ||
| scripts | ||
| static | ||
| stubs/taint | ||
| templates | ||
| tools | ||
| var/puppeteer | ||
| zerver | ||
| zilencer | ||
| zproject | ||
| .browserslistrc | ||
| .codecov.yml | ||
| .codespellignore | ||
| .editorconfig | ||
| .eslintignore | ||
| .eslintrc.json | ||
| .gitattributes | ||
| .gitignore | ||
| .gitlint | ||
| .mailmap | ||
| .npmignore | ||
| .prettierignore | ||
| .pyre_configuration | ||
| .sonarcloud.properties | ||
| .yarnrc | ||
| babel.config.js | ||
| CODE_OF_CONDUCT.md | ||
| CONTRIBUTING.md | ||
| Dockerfile-postgresql | ||
| LICENSE | ||
| manage.py | ||
| NOTICE | ||
| package.json | ||
| postcss.config.js | ||
| prettier.config.js | ||
| pyproject.toml | ||
| README.md | ||
| SECURITY.md | ||
| setup.cfg | ||
| stylelint.config.js | ||
| tsconfig.json | ||
| Vagrantfile | ||
| version.py | ||
| webpack.config.ts | ||
| yarn.lock | ||
Zulip overview
Zulip is an open-source team collaboration tool with unique topic-based threading that combines the best of email and chat to make remote work productive and delightful. Fortune 500 companies, leading open source projects, and thousands of other organizations use Zulip every day. Zulip is the only modern team chat app that is designed for both live and asynchronous conversations.
Zulip is built by a distributed community of developers from all around the world, with 74+ people who have each contributed 100+ commits. With over 1000 contributors merging over 500 commits a month, Zulip is the largest and fastest growing open source team chat project.
Come find us on the development community chat!
Getting started
-
Contributing code. Check out our guide for new contributors to get started. We have invested into making Zulip’s code uniquely readable, well tested, and easy to modify. Beyond that, we have written an extraordinary 150K words of documentation on how to contribute to Zulip.
-
Contributing non-code. Report an issue, translate Zulip into your language, or give us feedback. We'd love to hear from you, whether you've been using Zulip for years, or are just trying it out for the first time.
-
Checking Zulip out. The best way to see Zulip in action is to drop by the Zulip community server. We also recommend reading about Zulip's unique approach to organizing conversations.
-
Running a Zulip server. Self host Zulip directly on Ubuntu or Debian Linux, in Docker, or with prebuilt images for Digital Ocean and Render. Learn more about self-hosting Zulip.
-
Using Zulip without setting up a server. Learn about Zulip Cloud hosting options. Zulip sponsors free Zulip Cloud Standard for hundreds of worthy organizations, including fellow open-source projects.
-
Participating in outreach programs like Google Summer of Code and Outreachy.
-
Supporting Zulip. Advocate for your organization to use Zulip, become a sponsor, write a review in the mobile app stores, or help others find Zulip.
You may also be interested in reading our blog, and following us on Twitter and LinkedIn.
Zulip is distributed under the Apache 2.0 license.