zhangjinpeng1987

zhangjinpeng1987

Member Since 5 years ago

Experience Points
95
follower
Lessons Completed
2
follow
Lessons Completed
8
stars
Best Reply Awards
31
repos

72 contributions in the last year

Pinned
⚡ A toy storage, used to compare different mvcc storage models.
⚡ TiDB is an open source distributed HTAP database compatible with the MySQL protocol
⚡ A library that provides an embeddable, persistent key-value store for fast storage.
⚡ Distributed transactional key-value database, originally created to complement TiDB
⚡ A persistent storage engine for Multi-Raft log
⚡ Placement driver for TiKV
Activity
Dec
3
2 days ago
push

zhangjinpeng1987 push zhangjinpeng1987/tikv

zhangjinpeng1987
zhangjinpeng1987

tests: correct the test time range

Signed-off-by: zhangjinpeng1987 [email protected]

commit sha: 8b1f080bb4a9ce4d946cdf5aaffa273bb8bdf70b

push time in 2 days ago
Dec
2
3 days ago
push

zhangjinpeng1987 push zhangjinpeng1987/tikv

zhangjinpeng1987
zhangjinpeng1987

address comments: consider time span and also consider records count

Signed-off-by: zhangjinpeng1987 [email protected]

commit sha: a111e733b532ffc5dd3d6b0558f247f0a64a6aee

push time in 3 days ago
push

zhangjinpeng1987 push zhangjinpeng1987/tikv

zhangjinpeng1987
zhangjinpeng1987

comments: remove tail space

Signed-off-by: zhangjinpeng1987 [email protected]

commit sha: 10e0789b52407ee7f7f2af64aa9dd3d6673d1f36

push time in 3 days ago
push

zhangjinpeng1987 push zhangjinpeng1987/tikv

zhangjinpeng1987
zhangjinpeng1987

adjust the smoother capacity by time

Signed-off-by: zhangjinpeng1987 [email protected]

commit sha: 8c463cb5edf5a432cd9bf7985530e2dc9bd4661f

push time in 3 days ago
push

zhangjinpeng1987 push zhangjinpeng1987/tikv

zhangjinpeng1987
zhangjinpeng1987

fix return syntax

Signed-off-by: zhangjinpeng1987 [email protected]

commit sha: 20b9aa99a807aac22895b48e9163d82b615d353e

push time in 3 days ago
Dec
1
4 days ago
pull request

zhangjinpeng1987 pull request tikv/tikv

zhangjinpeng1987
zhangjinpeng1987

flow controller: consider the time factor when evaluating the trending

Signed-off-by: zhangjinpeng1987 [email protected]

What is changed and how it works?

Close #11530

What's Changed:

Consider time factor when evaluating the trending of pending compaction bytes or pending compaction memtable numbers.

Related changes

  • No

Check List

Tests

  • Unit test

Side effects

  • No

Release note

None
Activity icon
created branch
createdAt 4 days ago
Activity icon
issue

zhangjinpeng1987 issue tikv/tikv

zhangjinpeng1987
zhangjinpeng1987

flow_controller.rs trend function didn't consider the time range

Development Task

  • The flow_controller.rs trend function didn't consider the time range, so it maybe not accurate when evaluating the trending of pending compaction bytes or pending memtable numbers.
Activity icon
issue

zhangjinpeng1987 issue comment pingcap/tidb

zhangjinpeng1987
zhangjinpeng1987

The formula of calculating net cost for TiFlash plans is not aligned with TiKV plans'

Enhancement

In copTask.convertToRootTaskImpl, we can see the formula of calculating net cost for TiKV plans is est-row * row-size * net-factor, while in mppTask.convertToRootTaskImpl the formula for TiFlash plans is est-row * net-factor where the row-size is lost. That would let the cost of TiFlash plans be always relatively lower than TiKV plans', which can cause some wrong plans.

zhangjinpeng1987
zhangjinpeng1987
Nov
29
6 days ago
Activity icon
created branch

zhangjinpeng1987 in zhangjinpeng1987/kvproto create branch latency-requirement

createdAt 6 days ago
Nov
28
1 week ago
Activity icon
issue

zhangjinpeng1987 issue tikv/tikv

zhangjinpeng1987
zhangjinpeng1987

Supports Witness

Development Task

  • Add RFC(Design)
  • kvproto supports witness
  • Raftstore supports witness
    • witness replica doesn't apply
    • witness replica keep enough raft log
    • Leader election: leader replica failure and witness replica has latest log case
  • Tools compatibility
    • BR compatibility
    • CDC compatibility
    • Lightning compatibility
Activity icon
issue

zhangjinpeng1987 issue tikv/tikv

zhangjinpeng1987
zhangjinpeng1987

Support witness replica

Feature Request

Is your feature request related to a problem? Please describe:

  • Currently, TiKV maintains 3 replicas for each range of data to guarantee high availability and high durability, 1 leader replica and 2 follower replicas. Because the leader replica and follower replica need to apply the raft log to the state machine, so the IO and CPU cost of maintaining 1 leader replica and 2 follower replicas can not be ignored.

Describe the feature you'd like:

  • If there is a log-only replica that just keeps raft logs and does not apply to the state machine, it would help to reduce the cost of maintaining 3 replicas. Witness replica is such a replica that only keeps raft log but doesn't apply to the state machine.

Describe alternatives you've considered:

  • 2 replicas: but this alternative doesn't have the HA ability.

Teachability, Documentation, Adoption, Migration Strategy:

Activity icon
issue

zhangjinpeng1987 issue comment tikv/tikv

zhangjinpeng1987
zhangjinpeng1987

tikv oom when run sysbench prepare while query under stress

Bug Report

What version of TiKV are you using?

sh-4.2# ./tikv-server -V TiKV Release Version: 5.3.0-alpha Edition: Community Git Commit Hash: 500ead07966474d280f652dafc4fc4856b99d9e2 Git Commit Branch: heads/refs/tags/v5.3.0 UTC Build Time: 2021-11-09 11:32:30 Rust Version: rustc 1.56.0-nightly (2faabf579 2021-07-27) Enable Features: jemalloc mem-profiling portable sse protobuf-codec test-engines-rocksdb cloud-aws cloud-gcp Profile: dist_release

What operating system and CPU are you using?

8core 16G memory

Steps to reproduce

TiKVWorkloadStress003 run run sysbench prepare and select *

What did you expect?

all instance are normal

What did happened?

one tikv oom several times image

image

zhangjinpeng1987
zhangjinpeng1987
Activity icon
issue

zhangjinpeng1987 issue comment tikv/tikv

zhangjinpeng1987
zhangjinpeng1987

QPS dropped severely many times during scale out in DBaaS

Bug Report

What version of TiKV are you using?

5.3.0

What operating system and CPU are you using?

Cluster Tier T3.standard

Steps to reproduce

run sysbench prepare sysbench --config-file=config oltp_insert --tables=16 --table-size=100000000 prepare threads=8 scale out tikv from 3 to 6

What did you expect?

performance is stable

What did happened?

NYTJMTk7xU zYRSkgmqqs

zhangjinpeng1987
zhangjinpeng1987
Nov
27
1 week ago
open pull request

zhangjinpeng1987 wants to merge tikv/tikv

zhangjinpeng1987
zhangjinpeng1987

raftstore: Fetch raft log in async manner

Signed-off-by: Connor1996 [email protected]

What problem does this PR solve?

Issue Number: close https://github.com/tikv/tikv/issues/11320

Problem Summary:

What is changed and how it works?

Proposal: xxx

What's Changed:

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • PR to update pingcap/tidb-ansible:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression
    • Consumes more CPU
    • Consumes more MEM
  • Breaking backward compatibility

Release note

Please add a release note.
If you don't think this PR needs a release note then fill it with None.
zhangjinpeng1987
zhangjinpeng1987

This filename is better to be 'raftlog_fetcher.rs'?

pull request

zhangjinpeng1987 merge to tikv/tikv

zhangjinpeng1987
zhangjinpeng1987

raftstore: Fetch raft log in async manner

Signed-off-by: Connor1996 [email protected]

What problem does this PR solve?

Issue Number: close https://github.com/tikv/tikv/issues/11320

Problem Summary:

What is changed and how it works?

Proposal: xxx

What's Changed:

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • PR to update pingcap/tidb-ansible:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression
    • Consumes more CPU
    • Consumes more MEM
  • Breaking backward compatibility

Release note

Please add a release note.
If you don't think this PR needs a release note then fill it with None.
Nov
26
1 week ago
Activity icon
delete

zhangjinpeng1987 in zhangjinpeng1987/tikv delete branch duration-composition

deleted time in 1 week ago
Activity icon
issue

zhangjinpeng1987 issue tikv/tikv

zhangjinpeng1987
zhangjinpeng1987

It is hard to locate the root cause of latency jitter issues

Development Task

  • add an unified duration composition metric, make it easy to locate the biggest latency contributor in a glance.
pull request

zhangjinpeng1987 pull request tikv/tikv

zhangjinpeng1987
zhangjinpeng1987

metrics: add duration composition panel

Signed-off-by: zhangjinpeng1987 [email protected]

What problem does this PR solve?

Problem Summary: It is not easy to diagnose the root cause of the latency jitter issue quickly for users, users may lost in the large amount of metrics which need high professional understanding of TiKV.

What is changed and how it works?

What's Changed: Add a "Duration Composition" panel to show the duration composition of write requests and coprocessor requests, it is easy to locate the biggest duration contributor in a glance.

Screen Shot 2021-11-26 at 5 29 18 PM

Related changes

Check List

Tests

  • No code

Side effects

  • No

Release note

None
Activity icon
created branch

zhangjinpeng1987 in zhangjinpeng1987/tikv create branch duration-composition

createdAt 1 week ago
Nov
22
1 week ago
Activity icon
issue

zhangjinpeng1987 issue comment tikv/tikv

zhangjinpeng1987
zhangjinpeng1987

run sysbench update,After a slow node appears, QPS recover is too slow due to detection algorithm

Bug Report

What version of TiKV are you using?

[[email protected] bin]#5.1.0-alpha Edition: Community Git Commit Hash: f67aa380277b333a5b879f7837af51a9bd4ee0b8 Git Commit Branch: slow-store-fix-5.2 UTC Build Time: 2021-08-06 06:39:43 Rust Version: rustc 1.56.0-nightly (2faabf579 2021-07-27) Enable Features: jemalloc mem-profiling portable sse protobuf-codec test-engines-rocksdb cloud-aws cloud-gcp Profile: release

What operating system and CPU are you using?

tiup : 1PD 1TiDB 3TiKV all node:500G NVME、4 cores、8G mem

Steps to reproduce

1、tiup br:v5.2.0 restore full --pd="172.16.6.193:2379" --storage s3://benchmark/sysbench-32-300G-release-4.0 --s3.endpoint http://minio.pingcap.net:9000 --send-credentials-to-tikv=true 2、sysbench --config-file=config_test oltp_write_only --tables=32 --table-size=10000000 run threads=512 3、inject cpu limit(on 15:15) echo 1000000 > /sys/fs/cgroup/cpu,cpuacct/g3/cpu.cfs_quota_us echo 1000000 > /sys/fs/cgroup/cpu,cpuacct/g3/cpu.cfs_period_us echo $(pgrep tikv-server) > /sys/fs/cgroup/cpu,cpuacct/g3/cgroup.procs

What did you expect?

After a slow node appears, QPS resumes within 10 minutes

What did happened?

QPS resumes within 25 minutes 1 2

zhangjinpeng1987
zhangjinpeng1987

What is the root cause of this issue? Is it because of the transfer leader backoff mechanism or CPU limitation in the heavy workload?

Activity icon
issue

zhangjinpeng1987 issue tikv/tikv

zhangjinpeng1987
zhangjinpeng1987

truncate table during running sysbench insert,one tikv panic four times

Bug Report

What version of TiKV are you using?

TiKV Release Version: 5.2.0 Edition: Community Git Commit Hash: dbefa5ec66ccb4cb0cfeef3ad05d5932da2aeb36 Git Commit Branch: heads/refs/tags/v5.2.0 UTC Build Time: 2021-08-23 16:05:41 Rust Version: rustc 1.56.0-nightly (2faabf579 2021-07-27) Enable Features: jemalloc mem-profiling portable sse protobuf-codec test-engines-rocksdb cloud-aws cloud-gcp Profile: dist_release

What operating system and CPU are you using?

k8s 2TiDB、3PD、4TiKV(4core、8G mem)

Steps to reproduce

TiKVWorkloadStress004 1、import data (32 tables) /br restore full --pd tc-pd.glh-hqgsq.svc:2379 --storage s3://benchmark/sysbench-32-300G-release-4.0 --s3.endpoint http://minio.pingcap.net:9000 --send-credentials-to-tikv=true 2、run sysbench prepare tabNum=512 threads=8 3、after 5min, truncate table which step1 created (32 tables)

What did you expect?

tikv is normal

What did happened?

one tikv(tikv0) panic four times(from monitor,the memory usage is not high) 无标题 无标题1

logs: [FATAL] [lib.rs:465] ["rocksdb background error. db: kv, reason: compaction, error: IO error: While appending to file: /var/lib/tikv/data/db/438720.sst: Cannot allocate memory"] [backtrace="stack backtrace:\n 0: tikv_util::set_panic_hook::{{closure}}\n at components/tikv_util/src/lib.rs:464\n 1: std::panicking::rust_panic_with_hook\n at library/std/src/panicking.rs:626\n 2: std::panicking::begin_panic_handler::{{closure}}\n at library/std/src/panicking.rs:519\n 3: std::sys_common::backtrace::__rust_end_short_backtrace\n at library/std/src/sys_common/backtrace.rs:141\n 4: rust_begin_unwind\n at library/std/src/panicking.rs:515\n 5: std::panicking::begin_panic_fmt\n at library/std/src/panicking.rs:457\n 6: <engine_rocks::event_listener::RocksEventListener as rocksdb::event_listener::EventListener>::on_background_error\n 7: rocksdb::event_listener::on_background_error\n at rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/4e912a8/src/event_listener.rs:340\n 8: _ZN24crocksdb_eventlistener_t17OnBackgroundErrorEN7rocksdb21BackgroundErrorReasonEPNS0_6StatusE\n at crocksdb/c.cc:2352\n 9: _ZN7rocksdb12EventHelpers23NotifyOnBackgroundErrorERKSt6vectorISt10shared_ptrINS_13EventListenerEESaIS4_EENS_21BackgroundErrorReasonEPNS_6StatusEPNS_17InstrumentedMutexEPb\n at rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/4e912a8/librocksdb_sys/rocksdb/db/event_helpers.cc:53\n 10: _ZN7rocksdb12ErrorHandler10SetBGErrorERKNS_6StatusENS_21BackgroundErrorReasonE\n at rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/4e912a8/librocksdb_sys/rocksdb/db/error_handler.cc:219\n 11: _ZN7rocksdb6DBImpl20BackgroundCompactionEPbPNS_10JobContextEPNS_9LogBufferEPNS0_19PrepickedCompactionENS_3Env8PriorityE\n at rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/4e912a8/librocksdb_sys/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2799\n 12: _ZN7rocksdb6DBImpl24BackgroundCallCompactionEPNS0_19PrepickedCompactionENS_3Env8PriorityE\n at rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/4e912a8/librocksdb_sys/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2319\n 13: _ZN7rocksdb6DBImpl16BGWorkCompactionEPv\n at rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/4e912a8/librocksdb_sys/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2093\n 14: _ZNKSt8functionIFvvEEclEv\n at opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:687\n _ZN7rocksdb14ThreadPoolImpl4Impl8BGThreadEm\n at rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/4e912a8/librocksdb_sys/rocksdb/util/threadpool_imp.cc:266\n 15: _ZN7rocksdb14ThreadPoolImpl4Impl15BGThreadWrapperEPv\n at rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/4e912a8/librocksdb_sys/rocksdb/util/threadpool_imp.cc:307\n 16: execute_native_thread_routine\n 17: \n 18: clone\n"] [location=components/engine_rocks/src/event_listener.rs:108] [thread_name=] [2021/08/24 05:41:39.302 +00:00] [INFO] [lib.rs:80] ["Welcome to TiKV"] [2021/08/24 05:41:39.302 +00:00] [INFO] [lib.rs:85] ["Release Version: 5.2.0"] [2021/08/24 05:41:39.302 +00:00] [INFO] [lib.rs:85] ["Edition: Community"] [2021/08/24 05:41:39.302 +00:00] [INFO] [lib.rs:85] ["Git Commit Hash: dbefa5ec66ccb4cb0cfeef3ad05d5932da2aeb36"] [2021/08/24 05:41:39.302 +00:00] [INFO] [lib.rs:85] ["Git Commit Branch: heads/refs/tags/v5.2.0"] [2021/08/24 05:41:39.302 +00:00] [INFO] [lib.rs:85] ["UTC Build Time: Unknown (env var does not exist when building)"] [2021/08/24 05:41:39.302 +00:00] [INFO] [lib.rs:85] ["Rust Version: rustc 1.56.0-nightly (2faabf579 2021-07-27)"] [2021/08/24 05:41:39.302 +00:00] [INFO] [lib.rs:85] ["Enable Features: jemalloc mem-profiling portable sse protobuf-codec test-engines-rocksdb cloud-aws cloud-gcp"] [2021/08/24 05:41:39.302 +00:00] [INFO] [lib.rs:85] ["Profile: dist_release"]

Activity icon
issue

zhangjinpeng1987 issue tikv/tikv

zhangjinpeng1987
zhangjinpeng1987

*:Logger thread can be bottleneck and slow down the whole TiKV instance

Bug Report

What version of TiKV are you using?

4.0.14

What operating system and CPU are you using?

Linux x86_64

Steps to reproduce

Send extreme coprocessor load to a single region to trigger enormous amount of slow log that can overload slogger thread. Once slogger thread is overloaded, log message can accumulate and fill up pending message queue quickly. All code path that generates log are affected and can't make progress as usual.

What did you expect?

Try to send less slow log to slogger at first to make it less likely to collapse. And having a mechanism to short circuit not critical logs when slogger thread is heavy loaded.

What did happened?

Slogger thread uses 100% CPU and slow down everything. Especially coprocessor.

open pull request

zhangjinpeng1987 wants to merge tikv/rfcs

zhangjinpeng1987
zhangjinpeng1987

RFC: In-memory Pessimistic Locks

This is a more aggresive optimization than pipelined pessimistic lock. It tries to put pessimistic locks only in memory and not replicate them through Raft while not decreasing the success rate of pessimistic transactions.

According to preliminary tests, this optimization reduces disk write flow and raftstore CPU by about 20%.

cc @HunDunDM @youjiali1995 @gengliqi @cfzjywxk @MyonKeminta @longfangsong

zhangjinpeng1987
zhangjinpeng1987

pipelined pessimistic lock is off by default? so this feature will not open by default?

pull request

zhangjinpeng1987 merge to tikv/rfcs

zhangjinpeng1987
zhangjinpeng1987

RFC: In-memory Pessimistic Locks

This is a more aggresive optimization than pipelined pessimistic lock. It tries to put pessimistic locks only in memory and not replicate them through Raft while not decreasing the success rate of pessimistic transactions.

According to preliminary tests, this optimization reduces disk write flow and raftstore CPU by about 20%.

cc @HunDunDM @youjiali1995 @gengliqi @cfzjywxk @MyonKeminta @longfangsong

Nov
20
2 weeks ago
pull request

zhangjinpeng1987 merge to tikv/rfcs

zhangjinpeng1987
zhangjinpeng1987

RFC: In-memory Pessimistic Locks

This is a more aggresive optimization than pipelined pessimistic lock. It tries to put pessimistic locks only in memory and not replicate them through Raft while not decreasing the success rate of pessimistic transactions.

According to preliminary tests, this optimization reduces disk write flow and raftstore CPU by about 20%.

cc @HunDunDM @youjiali1995 @gengliqi @cfzjywxk @MyonKeminta @longfangsong

open pull request

zhangjinpeng1987 wants to merge tikv/rfcs

zhangjinpeng1987
zhangjinpeng1987

RFC: In-memory Pessimistic Locks

This is a more aggresive optimization than pipelined pessimistic lock. It tries to put pessimistic locks only in memory and not replicate them through Raft while not decreasing the success rate of pessimistic transactions.

According to preliminary tests, this optimization reduces disk write flow and raftstore CPU by about 20%.

cc @HunDunDM @youjiali1995 @gengliqi @cfzjywxk @MyonKeminta @longfangsong

zhangjinpeng1987
zhangjinpeng1987

How about the leader encounters network isolation?

Previous