ClickHouse

ClickHouse

Member Since 2 years ago

United States of America

Experience Points
0
follower
Lessons Completed
0
follow
Best Reply Awards
101
repos
Activity
May
18
13 hours ago
Activity icon
issue

leegean issue comment ClickHouse/ClickHouse

leegean
leegean

About clickHouse's memory management mechanism

Describe the issue My server has 32 GIGABytes of memory, and for experimental purposes I set max_server_memory_usage_to_RAM_ratio to 0.2. Everything else is default, but with occasional clickhouse queries,Memory keeps growing without cleaning up the old data in memory to free up some memory for more urgent query processing requests.

Question 1.what is clickHouse's memory reclamation mechanism? 2.when is it reclaimed? 3.How do I solve this problem?

Error message and/or stacktrace

2022.05.12 15:23:26.745074 [ 1106767 ] {} <Error> void DB::MergeTreeBackgroundExecutor<DB::MergeMutateRuntimeQueue>::routine(DB::TaskRuntimeDataPtr) [Queue = DB::MergeMutateRuntimeQueue]: Code: 241. DB::Exception: Memory limit (total) exceeded: would use 6.25 GiB (attempt to allocate chunk of 4200879 bytes), maximum: 6.25 GiB. (MEMORY_LIMIT_EXCEEDED), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0xaebed1a in /usr/bin/clickhouse
1. DB::Exception::Exception<char const*, char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, long&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, char const*&&, char const*&&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&&, long&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&&) @ 0xaed6d0c in /usr/bin/clickhouse
2. MemoryTracker::allocImpl(long, bool, MemoryTracker*) @ 0xaed6904 in /usr/bin/clickhouse
3. DB::MarksInCompressedFile::MarksInCompressedFile(unsigned long) @ 0x155828a5 in /usr/bin/clickhouse
4. DB::MergeTreeMarksLoader::loadMarksImpl() @ 0x15581865 in /usr/bin/clickhouse
5. DB::MergeTreeMarksLoader::loadMarks() @ 0x15580de8 in /usr/bin/clickhouse
6. DB::MergeTreeReaderCompact::getReadBufferSize(std::__1::shared_ptr<DB::IMergeTreeDataPart const> const&, DB::MergeTreeMarksLoader&, std::__1::vector<std::__1::optional<unsigned long>, std::__1::allocator<std::__1::optional<unsigned long> > > const&, std::__1::deque<DB::MarkRange, std::__1::allocator<DB::MarkRange> > const&) @ 0x1557970e in /usr/bin/clickhouse
7. DB::MergeTreeReaderCompact::MergeTreeReaderCompact(std::__1::shared_ptr<DB::MergeTreeDataPartCompact const>, DB::NamesAndTypesList, std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, DB::UncompressedCache*, DB::MarkCache*, std::__1::deque<DB::MarkRange, std::__1::allocator<DB::MarkRange> >, DB::MergeTreeReaderSettings, std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, double, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, double> > >, std::__1::function<void (DB::ReadBufferFromFileBase::ProfileInfo)> const&, int) @ 0x1557877d in /usr/bin/clickhouse
8. DB::MergeTreeDataPartCompact::getReader(DB::NamesAndTypesList const&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, std::__1::deque<DB::MarkRange, std::__1::allocator<DB::MarkRange> > const&, DB::UncompressedCache*, DB::MarkCache*, DB::MergeTreeReaderSettings const&, std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, double, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, double> > > const&, std::__1::function<void (DB::ReadBufferFromFileBase::ProfileInfo)> const&) const @ 0x154e4e6c in /usr/bin/clickhouse
9. DB::MergeTreeSequentialSource::MergeTreeSequentialSource(DB::MergeTreeData const&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, std::__1::shared_ptr<DB::IMergeTreeDataPart const>, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >, bool, bool, bool) @ 0x1559017c in /usr/bin/clickhouse
10. DB::MergeTask::ExecuteAndFinalizeHorizontalPart::createMergedStream() @ 0x15436af6 in /usr/bin/clickhouse
11. DB::MergeTask::ExecuteAndFinalizeHorizontalPart::prepare() @ 0x15434996 in /usr/bin/clickhouse
12. bool std::__1::__function::__policy_invoker<bool ()>::__call_impl<std::__1::__function::__default_alloc_func<DB::MergeTask::ExecuteAndFinalizeHorizontalPart::subtasks::'lambda'(), bool ()> >(std::__1::__function::__policy_storage const*) @ 0x15442bc9 in /usr/bin/clickhouse
13. DB::MergeTask::ExecuteAndFinalizeHorizontalPart::execute() @ 0x1543940b in /usr/bin/clickhouse
14. DB::MergeTask::execute() @ 0x1543e3ba in /usr/bin/clickhouse
15. DB::MergePlainMergeTreeTask::executeStep() @ 0x1542ffac in /usr/bin/clickhouse
16. DB::MergeTreeBackgroundExecutor<DB::MergeMutateRuntimeQueue>::routine(std::__1::shared_ptr<DB::TaskRuntimeData>) @ 0xae95feb in /usr/bin/clickhouse
17. DB::MergeTreeBackgroundExecutor<DB::MergeMutateRuntimeQueue>::threadFunction() @ 0xae95c39 in /usr/bin/clickhouse
18. ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0xaf6546a in /usr/bin/clickhouse
19. ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda0'()>(void&&, void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda0'()&&...)::'lambda'()::operator()() @ 0xaf674a4 in /usr/bin/clickhouse
20. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0xaf62837 in /usr/bin/clickhouse
21. ? @ 0xaf662fd in /usr/bin/clickhouse
22. ? @ 0x7ff75cf08609 in ?
23. clone @ 0x7ff75ce2d163 in ?
 (version 22.2.2.1)
leegean
leegean

image In the figure above, the overall memory footprint shows a linear upward trend.I've rebooted where memory is zero.

Activity icon
issue

den-crane issue ClickHouse/ClickHouse

den-crane
den-crane

New record was ignored with ReplicatedCollapsingMergeTree engine

ReplicatedCollapsingMergeTree engine, I insert one new record with the same key to update an old record in db, the new record was ignored, recorded in the log. It works fine with CollapsingMergeTree engine, without Replicated.

What's the suggestion to avoid it? Thanks!

We used ClickHouse-Keeper

ClickHouse version: 22.1.3.7

{54dd3654-493e-477a-8c35-7b7d07e3774e} chdb.dwd_ast_info (d2281671-feec-4554-aaa0-8230622f114d) (Replicated OutputStream): Block with ID all_6699375217494909757_14730988238813813170 already exists locally as part all_253_253_0; ignoring it.

Activity icon
issue

den-crane issue comment ClickHouse/ClickHouse

den-crane
den-crane

New record was ignored with ReplicatedCollapsingMergeTree engine

ReplicatedCollapsingMergeTree engine, I insert one new record with the same key to update an old record in db, the new record was ignored, recorded in the log. It works fine with CollapsingMergeTree engine, without Replicated.

What's the suggestion to avoid it? Thanks!

We used ClickHouse-Keeper

ClickHouse version: 22.1.3.7

{54dd3654-493e-477a-8c35-7b7d07e3774e} chdb.dwd_ast_info (d2281671-feec-4554-aaa0-8230622f114d) (Replicated OutputStream): Block with ID all_6699375217494909757_14730988238813813170 already exists locally as part all_253_253_0; ignoring it.

open pull request

rvasin wants to merge ClickHouse/ClickHouse

rvasin
rvasin

Add total_max_threads parameter

Changelog category:

  • New Feature

Changelog entry

Add total_max_threads parameter to increase performance in case of high RPS by means of limiting total number of threads for all queries.

Closes #36551

See the attached article for details: article.pdf

rvasin
rvasin

I did not understand - should I rename parameter total_max_threads into max_threads_for_all_queries or not? (and in which places?). If we decide to rename then I would rename it everywhere to keep the same naming logic everywhere.

pull request

rvasin merge to ClickHouse/ClickHouse

rvasin
rvasin

Add total_max_threads parameter

Changelog category:

  • New Feature

Changelog entry

Add total_max_threads parameter to increase performance in case of high RPS by means of limiting total number of threads for all queries.

Closes #36551

See the attached article for details: article.pdf

Activity icon
issue

taotaizhu-pw issue ClickHouse/ClickHouse

taotaizhu-pw
taotaizhu-pw

New record was ignored with ReplicatedCollapsingMergeTree engine

ReplicatedCollapsingMergeTree engine, I insert one new record with the same key as an old record in db, the new record was ignored, recorded in the log. It works fine with CollapsingMergeTree engine, without Replicated.

What's the suggestion to avoid it? Thanks!

We used ClickHouse-Keeper

ClickHouse version: 22.1.3.7

{54dd3654-493e-477a-8c35-7b7d07e3774e} chdb.dwd_ast_info (d2281671-feec-4554-aaa0-8230622f114d) (Replicated OutputStream): Block with ID all_6699375217494909757_14730988238813813170 already exists locally as part all_253_253_0; ignoring it.

Activity icon
delete

dependabot[bot] in ClickHouse/clickhouse-go delete branch dependabot/go_modules/github.com/paulmach/orb-0.7.1

deleted time in 39 minutes ago
push

gingerwizard push ClickHouse/clickhouse-go

gingerwizard
gingerwizard

Bump github.com/paulmach/orb from 0.7.0 to 0.7.1

Bumps github.com/paulmach/orb from 0.7.0 to 0.7.1.


updated-dependencies:

  • dependency-name: github.com/paulmach/orb dependency-type: direct:production update-type: version-update:semver-patch ...

Signed-off-by: dependabot[bot] [email protected]

gingerwizard
gingerwizard

Merge pull request #583 from ClickHouse/dependabot/go_modules/github.com/paulmach/orb-0.7.1

Bump github.com/paulmach/orb from 0.7.0 to 0.7.1

commit sha: 08fd610ece23b266a833a01cd5730e5b2c4d1a15

push time in 39 minutes ago
pull request

gingerwizard pull request ClickHouse/clickhouse-go

gingerwizard
gingerwizard

Bump github.com/paulmach/orb from 0.7.0 to 0.7.1

Bumps github.com/paulmach/orb from 0.7.0 to 0.7.1.

Release notes

Sourced from github.com/paulmach/orb's releases.

v0.7.1

v0.7.0 initially pointed to the wrong commit. After moving the tag there are some caching issues in GitHub actions. I hope this clears up the issue.

Changelog

Sourced from github.com/paulmach/orb's changelog.

v0.7.1 - 2022-05-16

No changes

The v0.7.0 tag was updated since it initially pointed to the wrong commit. This is causing caching issues.

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
open pull request

azat wants to merge ClickHouse/ClickHouse

azat
azat

Add total_max_threads parameter

Changelog category:

  • New Feature

Changelog entry

Add total_max_threads parameter to increase performance in case of high RPS by means of limiting total number of threads for all queries.

Closes #36551

See the attached article for details: article.pdf

azat
azat

Not only disk bound, thread may just poll something once in a ms or so.

pull request

azat merge to ClickHouse/ClickHouse

azat
azat

Add total_max_threads parameter

Changelog category:

  • New Feature

Changelog entry

Add total_max_threads parameter to increase performance in case of high RPS by means of limiting total number of threads for all queries.

Closes #36551

See the attached article for details: article.pdf

open pull request

azat wants to merge ClickHouse/ClickHouse

azat
azat

Add total_max_threads parameter

Changelog category:

  • New Feature

Changelog entry

Add total_max_threads parameter to increase performance in case of high RPS by means of limiting total number of threads for all queries.

Closes #36551

See the attached article for details: article.pdf

azat
azat

To me, functions can be leased as now, since it is already attached to processes, so simply total_max_threads is fine.

pull request

azat merge to ClickHouse/ClickHouse

azat
azat

Add total_max_threads parameter

Changelog category:

  • New Feature

Changelog entry

Add total_max_threads parameter to increase performance in case of high RPS by means of limiting total number of threads for all queries.

Closes #36551

See the attached article for details: article.pdf

push

mergify[bot] push ClickHouse/ClickHouse

mergify[bot]
mergify[bot]

add select implementation for MeiliSearch

mergify[bot]
mergify[bot]
mergify[bot]
mergify[bot]
mergify[bot]
mergify[bot]
mergify[bot]
mergify[bot]

rewrite SinkMeiliSearch using JSONRowOutputFormat

mergify[bot]
mergify[bot]

init commit with parsing and BAD realisation

mergify[bot]
mergify[bot]

rm unnessesary data committed by mistake

mergify[bot]
mergify[bot]

remove another unnessesary files

mergify[bot]
mergify[bot]

revert changes made to cube transform

mergify[bot]
mergify[bot]

erase blank line to restore initial state

mergify[bot]
mergify[bot]

feat grouping-sets: initial changes

mergify[bot]
mergify[bot]
mergify[bot]
mergify[bot]
mergify[bot]
mergify[bot]
mergify[bot]
mergify[bot]

commit sha: d5f870eac8e459362332f5da56aa3e485d924c19

push time in 41 minutes ago
Activity icon
issue

mergify[bot] issue comment ClickHouse/ClickHouse

mergify[bot]
mergify[bot]

Multiple client connection attempts if hostname resolves to multiple addresses

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in official stable or prestable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Client will try every IP address returned by DNS resolution until successful connection.

closes #6698

mergify[bot]
mergify[bot]

update

✅ Branch has been successfully updated

Activity icon
issue

yakov-olkhovskiy issue comment ClickHouse/ClickHouse

yakov-olkhovskiy
yakov-olkhovskiy

Multiple client connection attempts if hostname resolves to multiple addresses

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in official stable or prestable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Client will try every IP address returned by DNS resolution until successful connection.

closes #6698

pull request

KochetovNicolai merge to ClickHouse/ClickHouse

KochetovNicolai
KochetovNicolai

Speed up test 00157_cache_dictionary

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)
push

KochetovNicolai push ClickHouse/ClickHouse

KochetovNicolai
KochetovNicolai

tests/integration: add logging for NetworkManager

Signed-off-by: Azat Khuzhin [email protected]

KochetovNicolai
KochetovNicolai

tests/integration: use no-resolve and verbose for iptables --list

Signed-off-by: Azat Khuzhin [email protected]

KochetovNicolai
KochetovNicolai

tests/integration: remove superfluous import of PartitionManager

Signed-off-by: Azat Khuzhin [email protected]

KochetovNicolai
KochetovNicolai

tests/integration: fix possible race for iptables user rules inside containers

It is possible for network PartitionManager to work incorrectly, because of how docker setting up forward to DOCKER-USER chain, it first removes forward and then adds it back (see 1 and 2), however this introduce race for a short period of time, and this is enough for TCP to retransmit packets, and breaks network PartitionManager.

Here are some details from logs for 3:

2022-04-27 03:01:00 [ 621 ] DEBUG : Executing query SELECT node FROM distributed_table ORDER BY node on node2 (cluster.py:2879, query_and_get_error)

This query fails, from the server logs:

2022.04.27 03:01:00.213101 [ 10 ] {19b1719f-8c39-4e3e-b782-aa4c933650f2} <Debug> executeQuery: (from 172.16.5.1:59008) SELECT node FROM distributed_table ORDER BY node
...
2022.04.27 03:01:03.578439 [ 223 ] {19b1719f-8c39-4e3e-b782-aa4c933650f2} <Debug> Connection (node1:9000): Sent data for 2 scalars, total 2 rows in 0.000284672 sec., 6993 rows/sec., 68.00 B (232.15 KiB/sec.), compressed 0.4594594594594595 times to 148.00 B (505.16 KiB/sec.)
2022.04.27 03:01:03.590637 [ 223 ] {19b1719f-8c39-4e3e-b782-aa4c933650f2} <Debug> MergingSortedTransform: Merge sorted 3 blocks, 2 rows in 3.371592744 sec., 0.5931914533744174 rows/sec., 94.61 B/sec
2022.04.27 03:01:03.601256 [ 10 ] {19b1719f-8c39-4e3e-b782-aa4c933650f2} <Information> executeQuery: Read 2 rows, 28.00 B in 3.387950542 sec., 0 rows/sec., 8.26 B/sec.
2022.04.27 03:01:03.601894 [ 10 ] {19b1719f-8c39-4e3e-b782-aa4c933650f2} <Debug> MemoryTracker: Peak memory usage (for query): 334.38 KiB.

And from docker daemon log:

time="2022-04-27T03:00:59.916693113Z" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"iptables\",\"--wait\",\"-I\",\"DOCKER-USER\",\"1\",\"-p\",\"tcp\",\"-s\",\"172.16.5.2\",\"-d\",\"172.16.5.3\",\"-j\",\"DROP\"],\"Container\":\"b75f3b68cda51386bfbb9cceb67e92c4d217a5a1660bde2470b583cb1f4c7fc4\",\"Privileged\":true,\"Tty\":false,\"User\":\"\"}"
time="2022-04-27T03:01:00.030654116Z" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"iptables\",\"--wait\",\"-I\",\"DOCKER-USER\",\"1\",\"-p\",\"tcp\",\"-s\",\"172.16.5.3\",\"-d\",\"172.16.5.2\",\"-j\",\"DROP\"],\"Container\":\"b75f3b68cda51386bfbb9cceb67e92c4d217a5a1660bde2470b583cb1f4c7fc4\",\"Privileged\":true,\"Tty\":false,\"User\":\"\"}"
...
time="2022-04-27T03:01:03.515813984Z" level=debug msg="/usr/sbin/iptables, [--wait -t filter -n -L DOCKER-USER]"
time="2022-04-27T03:01:03.531106486Z" level=debug msg="/usr/sbin/iptables, [--wait -t filter -C DOCKER-USER -j RETURN]"
time="2022-04-27T03:01:03.535442346Z" level=debug msg="/usr/sbin/iptables, [--wait -t filter -C FORWARD -j DOCKER-USER]"
time="2022-04-27T03:01:03.555856911Z" level=debug msg="/usr/sbin/iptables, [--wait -D FORWARD -j DOCKER-USER]"
time="2022-04-27T03:01:03.564905764Z" level=debug msg="/usr/sbin/iptables, [--wait -I FORWARD -j DOCKER-USER]"
...
time="2022-04-27T03:01:03.706374466Z" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"iptables\",\"--wait\",\"-D\",\"DOCKER-USER\",\"-p\",\"tcp\",\"-s\",\"172.16.5.3\",\"-d\",\"172.16.5.2\",\"-j\",\"DROP\"],\"Container\":\"b75f3b68cda51386bfbb9cceb67e92c4d217a5a1660bde2470b583cb1f4c7fc4\",\"Privileged\":true,\"Tty\":false,\"User\":\"\"}"
time="2022-04-27T03:01:03.968077970Z" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"iptables\",\"--wait\",\"-D\",\"DOCKER-USER\",\"-p\",\"tcp\",\"-s\",\"172.16.5.2\",\"-d\",\"172.16.5.3\",\"-j\",\"DROP\"],\"Container\":\"b75f3b68cda51386bfbb9cceb67e92c4d217a5a1660bde2470b583cb1f4c7fc4\",\"Privileged\":true,\"Tty\":false,\"User\":\"\"}"

I've tried multiple ways of fixing this:

  • Creating separate chain for rules from PartitionManager (DOCKER-USER-CLICKHOUSE) But it is created only once, and docker places new rules on top of the FORWARD chain, so it will not work, since it will not receive any packets

  • Use DOCKER-USER, but replace iptables with a wrapper ([script]), that will ignore recreating of a rule for forward to DOCKER-USER, but this will not work too, since new docker rules will be created on top of FORWARD chain, and so DOCKER-USER will packets.

    [script]:

    if [[ "$" =~ "-D FORWARD -j DOCKER-USER" ]]; then exit 0 fi if [[ "$" =~ "-I FORWARD -j DOCKER-USER" ]]; then if iptables.real iptables -C FORWARD -j DOCKER-USER; then exit 0 fi fi

  • And the only way to avoid flakiness for this case, is to forbid parallel execution for tests with PartitionManager.

Signed-off-by: Azat Khuzhin [email protected]

KochetovNicolai
KochetovNicolai

tests/integration: add prefix match for skipped tests

This way you can specify only file/module, or test name without parameters.

Since, at least one, test that we care about (test_distributed_respect_user_timeouts/test.py::test_reconnect) was not runned sequentially 1.

Signed-off-by: Azat Khuzhin [email protected]

KochetovNicolai
KochetovNicolai

Merge pull request #37138 from azat/integration-tests-iptables

tests/integration: fix possible race for iptables user rules inside containers

commit sha: a19d4c6f1fc49bfb4964c7288ba9373ff908306a

push time in 46 minutes ago
pull request

KochetovNicolai pull request ClickHouse/ClickHouse

KochetovNicolai
KochetovNicolai

tests/integration: fix possible race for iptables user rules inside containers

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

tests/integration: fix possible race for iptables user rules inside containers

TL;DR;

It is possible for network PartitionManager to work incorrectly, because of how docker setting up forward to DOCKER-USER chain, it first removes forward and then adds it back (see 1 and 2), however this introduce race for a short period of time, and this is enough for TCP to retransmit packets, and breaks network PartitionManager.

Here are some details from logs for 3:

2022-04-27 03:01:00 [ 621 ] DEBUG : Executing query SELECT node FROM distributed_table ORDER BY node on node2 (cluster.py:2879, query_and_get_error)

This query fails, from the server logs:

2022.04.27 03:01:00.213101 [ 10 ] {19b1719f-8c39-4e3e-b782-aa4c933650f2} <Debug> executeQuery: (from 172.16.5.1:59008) SELECT node FROM distributed_table ORDER BY node
...
2022.04.27 03:01:03.578439 [ 223 ] {19b1719f-8c39-4e3e-b782-aa4c933650f2} <Debug> Connection (node1:9000): Sent data for 2 scalars, total 2 rows in 0.000284672 sec., 6993 rows/sec., 68.00 B (232.15 KiB/sec.), compressed 0.4594594594594595 times to 148.00 B (505.16 KiB/sec.)
2022.04.27 03:01:03.590637 [ 223 ] {19b1719f-8c39-4e3e-b782-aa4c933650f2} <Debug> MergingSortedTransform: Merge sorted 3 blocks, 2 rows in 3.371592744 sec., 0.5931914533744174 rows/sec., 94.61 B/sec
2022.04.27 03:01:03.601256 [ 10 ] {19b1719f-8c39-4e3e-b782-aa4c933650f2} <Information> executeQuery: Read 2 rows, 28.00 B in 3.387950542 sec., 0 rows/sec., 8.26 B/sec.
2022.04.27 03:01:03.601894 [ 10 ] {19b1719f-8c39-4e3e-b782-aa4c933650f2} <Debug> MemoryTracker: Peak memory usage (for query): 334.38 KiB.

And from docker daemon log:

time="2022-04-27T03:00:59.916693113Z" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"iptables\",\"--wait\",\"-I\",\"DOCKER-USER\",\"1\",\"-p\",\"tcp\",\"-s\",\"172.16.5.2\",\"-d\",\"172.16.5.3\",\"-j\",\"DROP\"],\"Container\":\"b75f3b68cda51386bfbb9cceb67e92c4d217a5a1660bde2470b583cb1f4c7fc4\",\"Privileged\":true,\"Tty\":false,\"User\":\"\"}"
time="2022-04-27T03:01:00.030654116Z" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"iptables\",\"--wait\",\"-I\",\"DOCKER-USER\",\"1\",\"-p\",\"tcp\",\"-s\",\"172.16.5.3\",\"-d\",\"172.16.5.2\",\"-j\",\"DROP\"],\"Container\":\"b75f3b68cda51386bfbb9cceb67e92c4d217a5a1660bde2470b583cb1f4c7fc4\",\"Privileged\":true,\"Tty\":false,\"User\":\"\"}"
...
time="2022-04-27T03:01:03.515813984Z" level=debug msg="/usr/sbin/iptables, [--wait -t filter -n -L DOCKER-USER]"
time="2022-04-27T03:01:03.531106486Z" level=debug msg="/usr/sbin/iptables, [--wait -t filter -C DOCKER-USER -j RETURN]"
time="2022-04-27T03:01:03.535442346Z" level=debug msg="/usr/sbin/iptables, [--wait -t filter -C FORWARD -j DOCKER-USER]"
time="2022-04-27T03:01:03.555856911Z" level=debug msg="/usr/sbin/iptables, [--wait -D FORWARD -j DOCKER-USER]"
time="2022-04-27T03:01:03.564905764Z" level=debug msg="/usr/sbin/iptables, [--wait -I FORWARD -j DOCKER-USER]"
...
time="2022-04-27T03:01:03.706374466Z" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"iptables\",\"--wait\",\"-D\",\"DOCKER-USER\",\"-p\",\"tcp\",\"-s\",\"172.16.5.3\",\"-d\",\"172.16.5.2\",\"-j\",\"DROP\"],\"Container\":\"b75f3b68cda51386bfbb9cceb67e92c4d217a5a1660bde2470b583cb1f4c7fc4\",\"Privileged\":true,\"Tty\":false,\"User\":\"\"}"
time="2022-04-27T03:01:03.968077970Z" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"iptables\",\"--wait\",\"-D\",\"DOCKER-USER\",\"-p\",\"tcp\",\"-s\",\"172.16.5.2\",\"-d\",\"172.16.5.3\",\"-j\",\"DROP\"],\"Container\":\"b75f3b68cda51386bfbb9cceb67e92c4d217a5a1660bde2470b583cb1f4c7fc4\",\"Privileged\":true,\"Tty\":false,\"User\":\"\"}"

I've tried multiple ways of fixing this:

  • Creating separate chain for rules from PartitionManager (DOCKER-USER-CLICKHOUSE) But it is created only once, and docker places new rules on top of the FORWARD chain, so it will not work, since it will not receive any packets

  • Use DOCKER-USER, but replace iptables with a wrapper ([script]), that will ignore recreating of a rule for forward to DOCKER-USER, but this will not work too, since new docker rules will be created on top of FORWARD chain, and so DOCKER-USER will packets.

    [script]:

    if [[ "$*" =~ "-D FORWARD -j DOCKER-USER" ]]; then
        exit 0
    fi
    if [[ "$*" =~ "-I FORWARD -j DOCKER-USER" ]]; then
        if iptables.real iptables -C FORWARD -j DOCKER-USER; then
            exit 0
        fi
    fi
  • And the only way to avoid flakiness for this case, is to forbid parallel execution for tests with PartitionManager.

Fixes: #36541 (fixes first problem, everything else, had been already fixed) Refs: https://github.com/moby/moby/pull/43585

More CI:

Activity icon
issue

KochetovNicolai issue ClickHouse/ClickHouse

KochetovNicolai
KochetovNicolai

Weirdness with timeouts in StorageDistributed

https://s3.amazonaws.com/clickhouse-test-reports/36463/5d129e13ee8db1aab39e02f1778524cad315805d/integration_tests__asan__actions__[2/3].html

The test simple breaks connectivity between two nodes and checks, that connection timeout in StorageDistributed works: https://github.com/ClickHouse/ClickHouse/blob/3246261da8a3152bc0ecf0c8855120454d76877f/tests/integration/test_distributed_respect_user_timeouts/test.py#L174-L187

But seem slike something went wrong and PartitionManager did not break connectivity completely, so node2 has successfully connected to node1 and sent some data (the following log is from node2):

2022.04.20 18:56:39.566138 [ 224 ] {224b5ee7-ad99-4575-98bb-f7dab9bbfb87} <Trace> Connection (node1:9000): Connecting. Database: (not specified). User: default

2022.04.20 18:56:42.569127 [ 224 ] {224b5ee7-ad99-4575-98bb-f7dab9bbfb87} <Warning> HedgedConnectionsFactory: Connection failed at try №1, reason: Code: 209. DB::NetException: Timeout: connect timed out: 172.16.5.3:9000 (node1:9000, receive timeout 0 ms, send timeout 0 ms). (SOCKET_TIMEOUT) (version 22.4.1.2246)
2022.04.20 18:56:42.569275 [ 224 ] {224b5ee7-ad99-4575-98bb-f7dab9bbfb87} <Trace> Connection (node1:9000): Connecting. Database: (not specified). User: default
2022.04.20 18:56:42.571173 [ 224 ] {224b5ee7-ad99-4575-98bb-f7dab9bbfb87} <Trace> Connection (node1:9000): Connected to ClickHouse server version 22.4.1.
2022.04.20 18:56:42.572354 [ 224 ] {224b5ee7-ad99-4575-98bb-f7dab9bbfb87} <Debug> Connection (node1:9000): Sent data for 2 scalars, total 2 rows in 0.000386693 sec., 5157 rows/sec., 68.00 B (171.22 KiB/sec.), compressed 0.4594594594594595 times to 148.00 B (372.59 KiB/sec.)

Then it hung for ~17 minutes (with no log messages) and finally failed:

2022.04.20 19:13:09.901008 [ 225 ] {224b5ee7-ad99-4575-98bb-f7dab9bbfb87} <Trace> StorageDistributed (distributed_table): () Cancelling query
2022.04.20 19:13:09.919001 [ 10 ] {224b5ee7-ad99-4575-98bb-f7dab9bbfb87} <Error> executeQuery: Code: 209. DB::NetException: Timeout exceeded while reading from socket (172.16.5.3:9000, 300000 ms): while receiving packet from node1:9000: While executing Remote. (SOCKET_TIMEOUT) (version 22.4.1.2246) (from 172.16.5.1:35710) (in query: SELECT node FROM distributed_table ORDER BY node), Stack trace (when copying this message, always include the lines below):

0. ./build_docker/../contrib/libcxx/include/exception:133: Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x391486e9 in /usr/bin/clickhouse
1. ./build_docker/../src/Common/Exception.cpp:58: DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0xd5890e8 in /usr/bin/clickhouse
2. ./build_docker/../src/Common/NetException.h:12: DB::ReadBufferFromPocoSocket::nextImpl() @ 0x2b925dc9 in /usr/bin/clickhouse
3. ./build_docker/../src/IO/ReadBuffer.h:86: void DB::readVarUIntImpl<false>(unsigned long&, DB::ReadBuffer&) @ 0xd65a87e in /usr/bin/clickhouse
4. ./build_docker/../src/IO/VarInt.h:0: DB::Connection::receivePacket() @ 0x2e10b03b in /usr/bin/clickhouse
5. ./build_docker/../src/Client/PacketReceiver.h:0: DB::PacketReceiver::Routine::operator()(boost::context::fiber&&) @ 0x2e157165 in /usr/bin/clickhouse
6. ./build_docker/../contrib/libcxx/include/__utility/swap.h:36: boost::context::detail::fiber_capture_record<boost::context::fiber, FiberStack&, DB::PacketReceiver::Routine>::run() @ 0x2e156321 in /usr/bin/clickhouse
7. ./build_docker/../contrib/boost/boost/context/fiber_ucontext.hpp:74: void boost::context::detail::fiber_entry_func<boost::context::detail::fiber_capture_record<boost::context::fiber, FiberStack&, DB::PacketReceiver::Routine> >(void*) @ 0x2e150eab in /usr/bin/clickhouse

The weird things are:

  1. How did node2 connected to node1 when PartitionManager was active? Ok, it adds only one rule that drops only packets from node1 to node2, but I thought TCP requires some acknowledgment to establish connection. Maybe something wrong with PartitionManager/iptables rules in our integration tests environment.
  2. Why did it take ~17 minutes to get Timeout exceeded while reading from socket when timeout is 300000ms (5 minutes)? Also 17 is not divisible by 5.
  3. Why did IConnections::dumpAddresses return empty string? (StorageDistributed (distributed_table): () Cancelling query)
pull request

KochetovNicolai merge to ClickHouse/ClickHouse

KochetovNicolai
KochetovNicolai

tests/integration: fix possible race for iptables user rules inside containers

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

tests/integration: fix possible race for iptables user rules inside containers

TL;DR;

It is possible for network PartitionManager to work incorrectly, because of how docker setting up forward to DOCKER-USER chain, it first removes forward and then adds it back (see 1 and 2), however this introduce race for a short period of time, and this is enough for TCP to retransmit packets, and breaks network PartitionManager.

Here are some details from logs for 3:

2022-04-27 03:01:00 [ 621 ] DEBUG : Executing query SELECT node FROM distributed_table ORDER BY node on node2 (cluster.py:2879, query_and_get_error)

This query fails, from the server logs:

2022.04.27 03:01:00.213101 [ 10 ] {19b1719f-8c39-4e3e-b782-aa4c933650f2} <Debug> executeQuery: (from 172.16.5.1:59008) SELECT node FROM distributed_table ORDER BY node
...
2022.04.27 03:01:03.578439 [ 223 ] {19b1719f-8c39-4e3e-b782-aa4c933650f2} <Debug> Connection (node1:9000): Sent data for 2 scalars, total 2 rows in 0.000284672 sec., 6993 rows/sec., 68.00 B (232.15 KiB/sec.), compressed 0.4594594594594595 times to 148.00 B (505.16 KiB/sec.)
2022.04.27 03:01:03.590637 [ 223 ] {19b1719f-8c39-4e3e-b782-aa4c933650f2} <Debug> MergingSortedTransform: Merge sorted 3 blocks, 2 rows in 3.371592744 sec., 0.5931914533744174 rows/sec., 94.61 B/sec
2022.04.27 03:01:03.601256 [ 10 ] {19b1719f-8c39-4e3e-b782-aa4c933650f2} <Information> executeQuery: Read 2 rows, 28.00 B in 3.387950542 sec., 0 rows/sec., 8.26 B/sec.
2022.04.27 03:01:03.601894 [ 10 ] {19b1719f-8c39-4e3e-b782-aa4c933650f2} <Debug> MemoryTracker: Peak memory usage (for query): 334.38 KiB.

And from docker daemon log:

time="2022-04-27T03:00:59.916693113Z" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"iptables\",\"--wait\",\"-I\",\"DOCKER-USER\",\"1\",\"-p\",\"tcp\",\"-s\",\"172.16.5.2\",\"-d\",\"172.16.5.3\",\"-j\",\"DROP\"],\"Container\":\"b75f3b68cda51386bfbb9cceb67e92c4d217a5a1660bde2470b583cb1f4c7fc4\",\"Privileged\":true,\"Tty\":false,\"User\":\"\"}"
time="2022-04-27T03:01:00.030654116Z" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"iptables\",\"--wait\",\"-I\",\"DOCKER-USER\",\"1\",\"-p\",\"tcp\",\"-s\",\"172.16.5.3\",\"-d\",\"172.16.5.2\",\"-j\",\"DROP\"],\"Container\":\"b75f3b68cda51386bfbb9cceb67e92c4d217a5a1660bde2470b583cb1f4c7fc4\",\"Privileged\":true,\"Tty\":false,\"User\":\"\"}"
...
time="2022-04-27T03:01:03.515813984Z" level=debug msg="/usr/sbin/iptables, [--wait -t filter -n -L DOCKER-USER]"
time="2022-04-27T03:01:03.531106486Z" level=debug msg="/usr/sbin/iptables, [--wait -t filter -C DOCKER-USER -j RETURN]"
time="2022-04-27T03:01:03.535442346Z" level=debug msg="/usr/sbin/iptables, [--wait -t filter -C FORWARD -j DOCKER-USER]"
time="2022-04-27T03:01:03.555856911Z" level=debug msg="/usr/sbin/iptables, [--wait -D FORWARD -j DOCKER-USER]"
time="2022-04-27T03:01:03.564905764Z" level=debug msg="/usr/sbin/iptables, [--wait -I FORWARD -j DOCKER-USER]"
...
time="2022-04-27T03:01:03.706374466Z" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"iptables\",\"--wait\",\"-D\",\"DOCKER-USER\",\"-p\",\"tcp\",\"-s\",\"172.16.5.3\",\"-d\",\"172.16.5.2\",\"-j\",\"DROP\"],\"Container\":\"b75f3b68cda51386bfbb9cceb67e92c4d217a5a1660bde2470b583cb1f4c7fc4\",\"Privileged\":true,\"Tty\":false,\"User\":\"\"}"
time="2022-04-27T03:01:03.968077970Z" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"iptables\",\"--wait\",\"-D\",\"DOCKER-USER\",\"-p\",\"tcp\",\"-s\",\"172.16.5.2\",\"-d\",\"172.16.5.3\",\"-j\",\"DROP\"],\"Container\":\"b75f3b68cda51386bfbb9cceb67e92c4d217a5a1660bde2470b583cb1f4c7fc4\",\"Privileged\":true,\"Tty\":false,\"User\":\"\"}"

I've tried multiple ways of fixing this:

  • Creating separate chain for rules from PartitionManager (DOCKER-USER-CLICKHOUSE) But it is created only once, and docker places new rules on top of the FORWARD chain, so it will not work, since it will not receive any packets

  • Use DOCKER-USER, but replace iptables with a wrapper ([script]), that will ignore recreating of a rule for forward to DOCKER-USER, but this will not work too, since new docker rules will be created on top of FORWARD chain, and so DOCKER-USER will packets.

    [script]:

    if [[ "$*" =~ "-D FORWARD -j DOCKER-USER" ]]; then
        exit 0
    fi
    if [[ "$*" =~ "-I FORWARD -j DOCKER-USER" ]]; then
        if iptables.real iptables -C FORWARD -j DOCKER-USER; then
            exit 0
        fi
    fi
  • And the only way to avoid flakiness for this case, is to forbid parallel execution for tests with PartitionManager.

Fixes: #36541 (fixes first problem, everything else, had been already fixed) Refs: https://github.com/moby/moby/pull/43585

More CI:

KochetovNicolai
KochetovNicolai

LGTM, let's execute those tests in-order

open pull request

vdimir wants to merge ClickHouse/ClickHouse

vdimir
vdimir

Release without prestable

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

  • Do not create prestable release
  • to be continued

Relates to #34900

vdimir
vdimir

Seems self._create_gh_release(True) is not called anymore, so can we remove argument?

pull request

vdimir merge to ClickHouse/ClickHouse

vdimir
vdimir

Release without prestable

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

  • Do not create prestable release
  • to be continued

Relates to #34900

vdimir
vdimir

So we want to get rid of prestable and have testing and stable. Did I get it right?

pull request

vdimir merge to ClickHouse/ClickHouse

vdimir
vdimir

Release without prestable

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

  • Do not create prestable release
  • to be continued

Relates to #34900

vdimir
vdimir

So we want to get rid of prestable and have testing and stable. Did I get it right?

pull request

zhicwu pull request ClickHouse/clickhouse-jdbc

zhicwu
zhicwu

Add CLI client

clickhouse-cli-client is a wrapper of ClickHouse native command-line client.

push

alesapin push ClickHouse/ClickHouse

commit sha: 19462bdf9e96fd1271a96e827f683c656d907a56

push time in 51 minutes ago
pull request

arthurpassos pull request ClickHouse/clickhouse-cpp

arthurpassos
arthurpassos

Add empty arrays to LC(Array) existing unit test

In https://github.com/ClickHouse/clickhouse-cpp/issues/178 was reported that empty arrays in LC(Array) was crashing the client. This PR adds empty arrays to the existing unit test.

Not closing the issue because I am waiting for more OP information.

Previous