Memory timings - nicehash/NiceHashQuickMiner GitHub Wiki
Why is this important?
Until this version, users with any NVIDIA GPU with GDDR5X had to risk using EthEnlargementPill to reach better speeds and efficiency. EthEnlargementPill is a protected closed source binary. For the operation it does not need internet connection. Therefore it is especially suspicious because it does connect to some servers when you would expect it not to use any network resources. The issue was raised some time ago on Bitcointalk Forum. We are not saying that EthEnlargementPill is malicious, but considering being closed source, heavily protected and with only up to 100 lines of code that performs actual task while everything else is there for some unknown reasons, there are reasons to believe that not everything has been exposed what exactly this software does. Therefore we suggest you should NEVER run EthEnlargementPill on a computer that contains any sensitive information such as any logins or passwords or on a computer that you intend to use for any work that requires any kind of privacy. We were aware of the issue and worked hard to integrate this performance feature directly into NiceHash QuickMiner. Future versions will provide you with an automatic memory timings tuning.
From version 0.4.2.0 - ability to modify certain memory timings
Latest NiceHash QuickMiner has support for changing memory timings in OCTune! Set it and then save so it gets loaded every time Excavator is started up. Download from here: https://github.com/nicehash/NiceHashQuickMiner/releases
Note: we do not have access to any private NVAPI where everything is explained. We are doing reverse engineering and these are our findings. The only useful official documentation we got from NVIDIA regarding memory timings is their public information about memory tweaks. What we have discovered so far is collected in the following table:
Timing | Remarks |
---|---|
RC | this must be >=(RP+RAS) |
RFC | considerable boost possible |
RAS | considerable boost possible |
RP | considerable boost possible |
CFG0_R0 | |
CL | |
WL | |
RD_RCD | |
WR_RCD | |
CFG1_R0 | |
RPRE | |
WPRE | |
CDLR | |
WR | |
W2R_BUS | |
R2W_BUS | |
PDEX | |
PDEN2PDEX | |
FAW | EthEnlargementPill sets this to 16 (--revA sets to 20) |
AOND | |
CCDL | |
CCDS | |
REFRESH_LO | |
REFRESH | considerable boost possible |
RRD | EthEnlargementPill sets this to 4 (--revA sets to 5) |
DELAY0 | |
CFG4_R0 | |
ADR_MIN | |
CFG5_R0 | |
WRCRC | |
CFG5_R1 | |
OFFSET0 | |
DELAY0_MSB | |
OFFSET1 | |
OFFSET2 | |
DELAY01 |
Unfortunately, changing memory timings works only on Pascal and Volta series. If anyone has any tips that would get us to make this work on Turing and Ampere... there is a 1 BTC bounty for this piece of information!
Following cards are fully supported (tested):
GPU Model | Suggested timings |
---|---|
GP100 (Tesla) | It has huge issues with TLB thrashing so you will not get much speed regardless of timings, but 1 GB DAG gives you 70+ MH/s |
TITAN V | From 68 MH/s to 76 MH/s using following: "RC=45","RFC=251","RAS=23","RP=22","RD_RCD=10","FAW=12","REFRESH_LO=7","REFRESH=10","RRD=3" |
TITAN Xp | not tested yet, but most likely works |
GeForce GTX 1080 Ti | "FAW=16","RRD=4" is equ. what EthEnlargementPill does; hint: try negative memory clock and push memory timings even lower? |
GeForce GTX 1080 | "FAW=16","RRD=4" is equ. what EthEnlargementPill does |
GeForce GTX 1070 Ti | |
GeForce GTX 1070 | |
GeForce GTX 1060 6GB |
From this version on, simply use timing name equal value to set timing. You can set multiple of them; example to set FAW to 16 and RRD to 4:
[{
"time": 0,
"commands": [{
"id": 1,
"method": "device.set.memory.timings",
"params": ["0", "FAW=16", "RRD=4"]
}]
}, {
"time": 20,
"commands": [{
"id": 1,
"method": "workers.reset.all",
"params": []
}]
}, {
"time": 30,
"loop": 30,
"commands": [{
"id": 1,
"method": "worker.print.efficiencies",
"params": []
}]
}, {
"time": 1,
"loop": 4,
"commands": [{
"id": 1,
"method": "devices.smartfan.exec",
"params": []
}]
}, {
"event": "on_quit",
"commands": []
}, {
"event": "on_quickminer.start",
"commands": []
}, {
"event": "on_quickminer.stop",
"commands": []
}]
Calling device.get or devices.get also returns actual timings for each GPU device, example:
...
"gpu_memory_timings":
{
"bEditable": false,
"timings": {
"RC": 78,
"RFC": 210,
"RAS": 52,
"RP": 26,
"CFG0_R0": 0,
"CL": 24,
"WL": 5,
"RD_RCD": 26,
"WR_RCD": 16,
"CFG1_R0": 25,
"RPRE": 0,
"WPRE": 1,
"CDLR": 9,
"WR": 27,
"W2R_BUS": 7,
"R2W_BUS": 7,
"PDEX": 12,
"PDEN2PDEX": 2,
"FAW": 16,
"AOND": 0,
"CCDL": 2,
"CCDS": 2,
"REFRESH_LO": 5,
"REFRESH": 4,
"RRD": 4,
"DELAY0": 20,
"CFG4_R0": 28,
"ADR_MIN": 6,
"CFG5_R0": 0,
"WRCRC": 16,
"CFG5_R1": 0,
"OFFSET0": 39,
"DELAY0_MSB": 0,
"OFFSET1": 13,
"OFFSET2": 7,
"DELAY01": 12
}
},
...