We will use ChampSim to evaluate the effectiveness of your ML prefetchers. You
prefetching models will be trained using the Load Traces that we provide (details below),
and they will generate an Ouput File with a list of prefetches that will be fed back into
ChampSim to compute coverage, accuracy and instructions per cycle (IPC).
The traces can be found at this link.
Alternatively, the download.sh file can be used to download all of the files to
avoid bulk download restrictions from Box. You can also use the information found
in the download_links file to download the data in another fashion.
There are two types of traces that can be found here:
Load traces under the folder LoadTraces that you will use to train your ML models. The
load trace is a series of program's LLC accesses, and the trace format is as follows:
Unique Instr Id, Cycle Count, Load Address, Instruction Pointer of the Load, LLC hit/miss
The load traces are plain text CSV.
Execution traces under the folder ChampSimTraces that ChampSim will need to
compute IPC. You do not need these traces to train your models, they are
only provided to facilitate an evaluation using IPCs. Note that you do not
unzip execution traces as ChampSim expects it to be in the zipped format.
For a given Load Trace, your code should generate an output file that contains one
prefetch per line. Each line should consist of two space-separated integral
values, the unique instruction ID for which you want to issue a prefetch and the
load address you want to prefetch. The unique instruction ID corresponds to
the ID of the triggering load in the input Load Trace. You can include up to two
prefetches per load listed in the Load Trace. You can choose not to prefetch
for a load. Note that the prefetches should be in the order that they occur in the trace.
Should you exceed the maximum number of prefetches per load, the first two will
be kept and the remaining excess prefetches for that load will be discarded.
For example, consider a Load Trace as follows:
3659 cycle1 A ip1 1
5433 cycle2 B ip2 0
6928 cycle3 C ip3 0
Your output file could look something like this:
3659 A+1 # Issue first prefetch for Instruction 3569
3659 A+2 # Issue second prefetch for Instruction 3569
5433 B+8 # Issue only one prefetch for Instruction 5433
This has been rolled into one script ml_prefetch_sim.py. Below there are some
common use cases highlighted, but more information can be found for each of the
subcommands by running:
The following command will compile two ChampSim binaries: (1) A ChampSim binary
that reads your ML model's output from a file and uses that as a prefetcher,
and (2) A ChampSim binary with no prefetching that is to be used as a baseline
To use the above, you need to modify the model.py file with your model. The
prefetch warm-up instructions specify how many to include in the training set.
The remainder of the instructions are the evaluation set.
To use the above, you need to modify the model.py file with your model. The
prefetch warm-up instructions specify how many to include in the training set.
The remainder of the instructions are the evaluation set.
To test how submissions generalize, our test set evaluation will have two components:
Undisclosed execution samples for the training traces: You can submit a
pre-trained model for each benchmark in the training set, and we will
evaluate it on a different sample of the same benchmark
Undisclosed benchmarks: We will train and test your model on unseen
benchmarks using the training routines that you provide
ChampSim is a trace-based simulator for a microarchitecture study. You can sign up to the public mailing list by sending an empty mail to [email protected]. Traces for the 3rd Data Prefetching Championship (DPC-3) can be found from here (https://dpc3.compas.cs.stonybrook.edu/?SW_IS). A set of traces used for the 2nd Cache Replacement Championship (CRC-2) can be found from this link. (http://bit.ly/2t2nkUj)
ChampSim takes five parameters: Branch predictor, L1D prefetcher, L2C prefetcher, LLC replacement policy, and the number of cores.
For example, ./build_champsim.sh bimodal no no lru 1 builds a single-core processor with bimodal branch predictor, no L1/L2 data prefetchers, and the baseline LRU replacement policy for the LLC.
$ ./build_champsim.sh bimodal no no no no lru 1
$ ./build_champsim.sh ${BRANCH} ${L1I_PREFETCHER} ${L1D_PREFETCHER} ${L2C_PREFETCHER} ${LLC_PREFETCHER} ${LLC_REPLACEMENT} ${NUM_CORE}
Professor Daniel Jimenez at Texas A&M University kindly provided traces for DPC-3. Use the following script to download these traces (~20GB size and max simpoint only).
Execute run_champsim.sh with proper input arguments. The default TRACE_DIR in run_champsim.sh is set to $PWD/dpc3_traces.
Single-core simulation: Run simulation with run_champsim.sh script.
Usage: ./run_champsim.sh [BINARY] [N_WARM] [N_SIM] [TRACE] [OPTION]
$ ./run_champsim.sh bimodal-no-no-no-no-lru-1core 1 10 400.perlbench-41B.champsimtrace.xz
${BINARY}: ChampSim binary compiled by "build_champsim.sh" (bimodal-no-no-lru-1core)
${N_WARM}: number of instructions for warmup (1 million)
${N_SIM}: number of instructinos for detailed simulation (10 million)
${TRACE}: trace name (400.perlbench-41B.champsimtrace.xz)
${OPTION}: extra option for "-low_bandwidth" (src/main.cc)
Simulation results will be stored under "results_${N_SIM}M" as a form of "${TRACE}-${BINARY}-${OPTION}.txt".
Multi-core simulation: Run simulation with run_4core.sh script.
Work on your algorithms with your favorite text editor
$ vim branch/mybranch.bpred
$ vim prefetcher/mypref.l1d_pref
$ vim prefetcher/mypref.l2c_pref
$ vim prefetcher/mypref.llc_pref
$ vim replacement/myrepl.llc_repl
We have included only 4 sample traces, taken from SPEC CPU 2006. These
traces are short (10 million instructions), and do not necessarily cover the range of behaviors your
replacement algorithm will likely see in the full competition trace list (not
included). We STRONGLY recommend creating your own traces, covering
a wide variety of program types and behaviors.
The included Pin Tool champsim_tracer.cpp can be used to generate new traces.
We used Pin 3.2 (pin-3.2-81205-gcc-linux), and it may require
installing libdwarf.so, libelf.so, or other libraries, if you do not already
have them. Please refer to the Pin documentation (https://software.intel.com/sites/landingpage/pintool/docs/81205/Pin/html/)
for working with Pin 3.2.
Note on compatibility: If you are using newer linux kernels/Ubuntu versions (eg. 20.04LTS), you might run into issues (such as [1],[2],[3]) with the PIN3.2. ChampSim tracer works fine with newer PIN tool versions that can be downloaded from here. PIN3.17 is confirmed to work with Ubuntu 20.04.1 LTS.
Once downloaded, open tracer/make_tracer.sh and change PIN_ROOT to Pin's location.
Run ./make_tracer.sh to generate champsim_tracer.so.
Use the Pin tool like this
pin -t obj-intel64/champsim_tracer.so -- <your program here>
The tracer has three options you can set:
-o
Specify the output file for your trace.
The default is default_trace.champsim
-s <number>
Specify the number of instructions to skip in the program before tracing begins.
The default value is 0.
-t <number>
The number of instructions to trace, after -s instructions have been skipped.
The default value is 1,000,000.
For example, you could trace 200,000 instructions of the program ls, after
skipping the first 100,000 instructions, with this command:
Traces created with the champsim_tracer.so are approximately 64 bytes per instruction,
but they generally compress down to less than a byte per instruction using xz compression.
ChampSim measures the IPC (Instruction Per Cycle) value as a performance metric.
There are some other useful metrics printed out at the end of simulation.
Quangmire/ChampSim
Modified ChampSim for ML Prefetching Competition
We will use ChampSim to evaluate the effectiveness of your ML prefetchers. You prefetching models will be trained using the Load Traces that we provide (details below), and they will generate an Ouput File with a list of prefetches that will be fed back into ChampSim to compute coverage, accuracy and instructions per cycle (IPC).
Traces:
The traces can be found at this link. Alternatively, the
download.shfile can be used to download all of the files to avoid bulk download restrictions from Box. You can also use the information found in thedownload_linksfile to download the data in another fashion.There are two types of traces that can be found here:
The load traces are plain text CSV.
Output File
For a given Load Trace, your code should generate an output file that contains one prefetch per line. Each line should consist of two space-separated integral values, the unique instruction ID for which you want to issue a prefetch and the load address you want to prefetch. The unique instruction ID corresponds to the ID of the triggering load in the input Load Trace. You can include up to two prefetches per load listed in the Load Trace. You can choose not to prefetch for a load. Note that the prefetches should be in the order that they occur in the trace. Should you exceed the maximum number of prefetches per load, the first two will be kept and the remaining excess prefetches for that load will be discarded.
For example, consider a Load Trace as follows:
Your output file could look something like this:
Your Code:
Your code should have two modes of functioning:
Building, Running, and Evaluating
This has been rolled into one script
ml_prefetch_sim.py. Below there are some common use cases highlighted, but more information can be found for each of the subcommands by running:where subcommand is any of
build|run|evalBuilding
The following command will compile two ChampSim binaries: (1) A ChampSim binary that reads your ML model's output from a file and uses that as a prefetcher, and (2) A ChampSim binary with no prefetching that is to be used as a baseline
Training
To use the above, you need to modify the
model.pyfile with your model. The prefetch warm-up instructions specify how many to include in the training set. The remainder of the instructions are the evaluation set.Generating the Prefetch File
To use the above, you need to modify the
model.pyfile with your model. The prefetch warm-up instructions specify how many to include in the training set. The remainder of the instructions are the evaluation set.Running
To run the baseline ChampSim binaries on an execution trace:
To additionally run the ChampSim binary with your prefetcher:
To run the ChampSim binary with your prefetcher only:
Evaluation
To evaluate the performance of ML prefetcher (and compare it against the baseline of no prefetcher, Best Offset, SISB, and SISB Best Offset), run:
Competition Judging
To test how submissions generalize, our test set evaluation will have two components:
Undisclosed execution samples for the training traces: You can submit a pre-trained model for each benchmark in the training set, and we will evaluate it on a different sample of the same benchmark
Undisclosed benchmarks: We will train and test your model on unseen benchmarks using the training routines that you provide
Changes made to ChampSim for the competition:
ChampSim
ChampSim is a trace-based simulator for a microarchitecture study. You can sign up to the public mailing list by sending an empty mail to [email protected]. Traces for the 3rd Data Prefetching Championship (DPC-3) can be found from here (https://dpc3.compas.cs.stonybrook.edu/?SW_IS). A set of traces used for the 2nd Cache Replacement Championship (CRC-2) can be found from this link. (http://bit.ly/2t2nkUj)
Clone ChampSim repository
Compile
ChampSim takes five parameters: Branch predictor, L1D prefetcher, L2C prefetcher, LLC replacement policy, and the number of cores. For example,
./build_champsim.sh bimodal no no lru 1builds a single-core processor with bimodal branch predictor, no L1/L2 data prefetchers, and the baseline LRU replacement policy for the LLC.Download DPC-3 trace
Professor Daniel Jimenez at Texas A&M University kindly provided traces for DPC-3. Use the following script to download these traces (~20GB size and max simpoint only).
Run simulation
Execute
run_champsim.shwith proper input arguments. The defaultTRACE_DIRinrun_champsim.shis set to$PWD/dpc3_traces.run_champsim.shscript.Simulation results will be stored under "results_${N_SIM}M" as a form of "${TRACE}-${BINARY}-${OPTION}.txt".
run_4core.shscript.Note that we need to specify multiple trace files for
run_4core.sh.N_MIXis used to represent a unique ID for mixed multi-programmed workloads.Add your own branch predictor, data prefetchers, and replacement policy
Copy an empty template
Work on your algorithms with your favorite text editor
Compile and test
How to create traces
We have included only 4 sample traces, taken from SPEC CPU 2006. These traces are short (10 million instructions), and do not necessarily cover the range of behaviors your replacement algorithm will likely see in the full competition trace list (not included). We STRONGLY recommend creating your own traces, covering a wide variety of program types and behaviors.
The included Pin Tool champsim_tracer.cpp can be used to generate new traces. We used Pin 3.2 (pin-3.2-81205-gcc-linux), and it may require installing libdwarf.so, libelf.so, or other libraries, if you do not already have them. Please refer to the Pin documentation (https://software.intel.com/sites/landingpage/pintool/docs/81205/Pin/html/) for working with Pin 3.2.
Get this version of Pin:
Note on compatibility: If you are using newer linux kernels/Ubuntu versions (eg. 20.04LTS), you might run into issues (such as [1],[2],[3]) with the PIN3.2. ChampSim tracer works fine with newer PIN tool versions that can be downloaded from here. PIN3.17 is confirmed to work with Ubuntu 20.04.1 LTS.
Once downloaded, open tracer/make_tracer.sh and change PIN_ROOT to Pin's location. Run ./make_tracer.sh to generate champsim_tracer.so.
Use the Pin tool like this
The tracer has three options you can set:
For example, you could trace 200,000 instructions of the program ls, after skipping the first 100,000 instructions, with this command:
Traces created with the champsim_tracer.so are approximately 64 bytes per instruction, but they generally compress down to less than a byte per instruction using xz compression.
Evaluate Simulation
ChampSim measures the IPC (Instruction Per Cycle) value as a performance metric.
There are some other useful metrics printed out at the end of simulation.
Good luck and be a champion!