Logbook 2025 H1 - cardano-scaling/hydra GitHub Wiki
- I am at point where I wrote a test in the
DirectChainSpec
to open, close and fanout a head using blockfrost. The test is not entirely using blockfrost since we also rely on cardano-node on preview for querying UTxO for example.- The problem is that sometimes faucet UTxO comes back empty which is weird since I see at least one UTxO with large amount of ada but also some NFT's.
- Why
queryUTxO
fails to return the correct UTxO? I feel like this problem is not part of the thing I am trying to solve but I definitely need a green test otherwise I can't know if the chain following logic for blockfrost works. - Maybe I should test this in e2e scenario instead and see how things behave in
there?
Is it possible that cardano-node is not in sync completely before test is ran? - One thing I noticed - I am using blockfrost with faucet key to publish scripts at the beginning but I am not awaiting for the local cardano-node to see these transactions!
- Then when I query the faucet UTxO I either see empty UTxO or get BadInputs error so I think I need to await for script publishing transactions definitely.
- This is far from optimal - perhaps I need to create equivalent functions that work with blockfrost api instead?
- After re-mapping all needed functions into blockfrost versions and not use local cardano-node for anything, I still get BadInputs error..hmm. At least I see the correct UTxO picked up so I'll work my way from there. This is probably some logic in the UTxO seed...
- After adding blockfrost equivalent for
awaitForTransaction
I am still at the same place - makes sense. The produced output is not a problem but the actual transaction. - I pretty printed the faucet utxo and the tx and I don't see anything weird.
- Important note is - we get a valid tx when building but the Blockfrost returns an error when submitting!
- Decided to find just a singe utxo that is responsible from seeding from a faucet just to reduce a clutter but the error is the same:
Faucet UTxO: 815e52d1ee#0 ↦ 54829439 lovelace
62fb023528#1 ↦ 9261435223 lovelace
70e6c21881#1 ↦ 129433058019 lovelace + 1 13d1f7feab83ff4db444bf96b8677949c5bf9c709671f30ff8f33ab3.487964726120446f6f6d202d2033726420506c6163652054726f706879 + 1 19c98d04cdb6e1e782a73e693697d4a46ca9820d5d490a3bf6470a07.487964726120446f6f6d202d20326e6420506c6163652054726f706879 + 1 1a22028742629f3cf38b3d1036a088fea59eb30237a675420fb25c11.2331 + 1 6d92350897706b14832c62c5b5644e918f0b6b3b63ffc00a1a463828.487964726120446f6f6d202d2031737420506c6163652054726f706879 + 1 ad39d849181dc206488fd726240c00b55547153ffdca8c079e1e34d9.487964726120446f6f6d202d2031737420506c6163652054726f706879 + 1 bfe4ab531fd625ef33ea355fd85953eb944bffa401af767666ff411c.487964726120446f6f6d202d2031737420506c6163652054726f706879 + 1 c953682b6eb5891c0bda35718c5261587d57e5e408079cbeb8cf881a.2331 + 1 cd6076d9d0098da4c7670c08f230e4efe31d666263c9db5196805d6e.487964726120446f6f6d202d2031737420506c6163652054726f706879 + 1 d0c91707d75011026193c0fce742443dde66fa790936981ece5d9f8b.2331 + 69918000000 d8906ca5c7ba124a0407a32dab37b2c82b13b3dcd9111e42940dcea4.0014df105553444d + 1 dd7e36888a487f8b27687f65abd93e6825b4eb3ce592ee5f504862df.487964726120446f6f6d202d2031737420506c6163652054726f706879 + 1 fa10c5203512eeeb92bf79547b09f5cdb2e008689864b0175cca6fee.487964726120446f6f6d202d2034746820506c6163652054726f706879
Found UTxO: 62fb023528#1 ↦ 9261435223 lovelace
"f99907e0b4e3c9d554a68e76c3a72b4090cffb5c12d0cd471e29e1d0fa7184d2"
== INPUTS (1)
- cd62585298998cd809f6fe08a4af3087dab8f73ed67132b8c8fd4162fb023528#1
ShelleyAddress Testnet (KeyHashObj (KeyHash {unKeyHash = "9783be7d3c54f11377966dfabc9284cd6c32fca1cd42ef0a4f1cc45b"})) StakeRefNull
9261435223 lovelace
TxOutDatumNone
ReferenceScriptNone
== COLLATERAL INPUTS (0)
== REFERENCE INPUTS (0)
== OUTPUTS (2)
Total number of assets: 1
- ShelleyAddress Testnet (KeyHashObj (KeyHash {unKeyHash = "f8a68cd18e59a6ace848155a0e967af64f4d00cf8acee8adc95a6b0d"})) StakeRefNull
100000000 lovelace
TxOutDatumNone
- ShelleyAddress Testnet (KeyHashObj (KeyHash {unKeyHash = "9783be7d3c54f11377966dfabc9284cd6c32fca1cd42ef0a4f1cc45b"})) StakeRefNull
9161262858 lovelace
TxOutDatumNone
== TOTAL COLLATERAL
TxTotalCollateralNone
== RETURN COLLATERAL
TxReturnCollateralNone
== FEE
TxFeeExplicit ShelleyBasedEraConway (Coin 172365)
== VALIDITY
TxValidityNoLowerBound
TxValidityUpperBound ShelleyBasedEraConway Nothing
== MINT/BURN
0 lovelace
== SCRIPTS (0)
Total size (bytes): 0
== DATUMS (0)
== REDEEMERS (0)
== REQUIRED SIGNERS
[]
== METADATA
TxMetadataNone
can open, close & fanout a Head using Blockfrost [✘]
Failures:
test/Test/DirectChainSpec.hs:385:3:
1) Test.DirectChain can open, close & fanout a Head using Blockfrost
uncaught exception: APIBlockfrostError
BlockfrostError "BlockfrostBadRequest \"{\\\"contents\\\":{\\\"contents\\\":{\\\"contents\\\":{\\\"era\\\":\\\"ShelleyBasedEraConway\\\",\\\"error\\\":[\\\"ConwayUtxowFailure (UtxoFailure (ValueNotConservedUTxO (MaryValue (Coin 0) (MultiAsset (fromList []))) (MaryValue (Coin 9261435223) (MultiAsset (fromList [])))))\\\",\\\"ConwayUtxowFailure (UtxoFailure (BadInputsUTxO (fromList [TxIn (TxId {unTxId = SafeHash \\\\\\\"cd62585298998cd809f6fe08a4af3087dab8f73ed67132b8c8fd4162fb023528\\\\\\\"}) (TxIx {unTxIx = 1})])))\\\"],\\\"kind\\\":\\\"ShelleyTxValidationError\\\"},\\\"tag\\\":\\\"TxValidationErrorInCardanoMode\\\"},\\\"tag\\\":\\\"TxCmdTxSubmitValidationError\\\"},\\\"tag\\\":\\\"TxSubmitFail\\\"}\""
- I don't see anything wrong with the tx but blockfrost seems to thing the input is invalid for whatever reason. I'll explore tx endpoints to see if I can find something useful.
- Checked the mapping between blockfrost/cardano when creating UTxO and all looks good.
- Added
awaitForTransaction
- blockfrost variant which didn't help out with the error. - Made all functions in
Blockfrost.Client
work inBlockfrostClientT IO
so I can run them all from outside (I suspected that calling multiple blockfrost connections can cause problems) and this didn't help either. - I think all these changes I did are good to keep but I still don't see why submitting of a seeding tx fails?
- How to deal with incompatible deposits? We do observe them, but should the head logic track them?
- When introducing a
currentTime
to the open state (in order to determine deadline being good or not) I realize thatTick
contents would be useful to have on theObservation
chain event, which would be easily possible. How is the contestation deadline done?- Aha! The need for tracking a
currentTime : UTCTime
in theHeadState
can be worked around by tracking all deposits and discard them on tick. - Hm.. but that would move the decision whether to snapshot a pending deposit only to the
Tick
handling. Which means that it may only happen on the next block.. - But this is where the deposit tracking needs to go anyways .. we will never issue a snapshot directly when observing the deposit (that's why we are here) and if we decouple the issuance from the observation, the logic needs to go to the
Tick
handling anyways!
- Aha! The need for tracking a
- When changing
CommitRecorded
I stumble overnewLocalUTxO = localUTxO <> deposited
.. why would we want to update our local ledger already when recording the deposit!? This was likely a bug too.. - Specification will be quite different than what we need to implement: there are no deposits tracked and only a
wait
for any previous pending deposits. To what level do we need to specify the logic of delaying deposits and checking deadlines? - Why was the increment snapshotting only waiting for an "unresolved Decommit" before requesting a snapshot?
- Why do we need to wait at all (for other decommits or commits) if there is no snapshot in flight and we are the leader.. why not just snapshot what we want?
- After moving the incremental commit snapshot decision to
Tick
handling, the model fails because of aNewTx
can't spend a UTxO added through aDeposit
before -> interesting! - After bringing back a Uα equivalent to the
HeadLogic
the model spec consistently finds an emptyutxoToCommit
which fails to submit anincrementTx
-> good! - Interestingly, the model allows to do
action $ Deposit {headIdVar = var2, utxoToDeposit = [], deadline = 1864-06-16 04:36:38.606749385646 UTC}
which obviously results in an empty utxo to commit.. this can happen in the wild too!- Unclear where exactly we want to deal with empty deposits.
- Back to where we started with a very old
Deposit
and the node trying to do anincrement
with deadline already passed. This should be easy to fix by just not trying to snapshot it. However, what if a dishonesthydra-node
would do just that? Would we approve that snapshot? Certainly the on-chain script would forbid it, but this could stall the head.- This is similar to the empty utxo thing. While we can make our honest
hydra-node
do funky stuff, we must ensure that we do not sign snapshots that are funky! - Which tests would best capture this? The
ModelSpec
won't see these issues once our honest implementation stops requesting funky snapshots!
- This is similar to the empty utxo thing. While we can make our honest
- To determine whether a deposit is still (or already) fine, we are back to needing a notion of
UTCTime
when doing that decision? We could do that updating in theTick
handling and keep information about a deposit beingOutdated
or so. Then, the snapshot acknowledgment code can tell whether a deposit is valid and only sign if it is.- Tracking a full
Deposit
type inpendingDeposits
which has aDepositStatus
. - With the new
Deposit
type I can easily mark deposits asExpired
and need to fix several behavior tests to put realistic deadlines. However, the observability in tests is lacking and I definitely need aDepositExpired
server output to fix all tests.
- Tracking a full
- Deposit fixes: How to test this situation? I need a test suite that includes the off-chain logic, but also allows control over rollbacks and spending inputs.
- Model based tests are not including incremental commits :(
- TxTraceSpec contains deposit/increment, but does only exercise the L1 related code
- The behavior tests do cover deposit/increment behavior, but deposit observations are only injected! So rollbacks would not cover them.
- Lets bite the bullet.. at least the model-based
MockChain
could be easily adapated to do deposits insimulateCommit
? - Ran into the same issue as we had on CI when shrinking was failingon partial
!
. Guarding theshrinkAction
to only include actions if theirparty
is still in the seed seems to fix this.. but now shrinking does not terminate?- Detour on improving shrinking and counterexamples of that
checkModel
problem .. shifting back to fixing deposits.
- Detour on improving shrinking and counterexamples of that
- After adding
Deposit
actions, implementing asimulateDeposit
and adjusting some generators/preconditions, I consistently run into test failures withdeadline <- arbitrary
. This is already interesting! Thehydra-node
seems to still try to increment deposits with very far in the past (year 1864) deadlines -> first bug found and reproducible!
-
After using blockfrost query to get all eras and try to construct
EraHistory
I was surprised to discover that usingnonEmptyFromList
fails. -
I know for sure that I am not constructing empty list here so this is confusing.
-
Fond the example in the atlas repo https://atlas-app.io/ but those were also failing which is even more surprising.
-
When looking at the blockfrost query results I noticed there are multiple
NetworkEraSummary
that start and end with slot 0 which is surprising:
eras: [ NetworkEraSummary
{ _networkEraStart = NetworkEraBound
{ _boundEpoch = Epoch 0
, _boundSlot = Slot 0
, _boundTime = 0s
}
, _networkEraEnd = NetworkEraBound
{ _boundEpoch = Epoch 0
, _boundSlot = Slot 0
, _boundTime = 0s
}
, _networkEraParameters = NetworkEraParameters
{ _parametersEpochLength = EpochLength 4320
, _parametersSlotLength = 20s
, _parametersSafeZone = 864
}
}
, NetworkEraSummary
{ _networkEraStart = NetworkEraBound
{ _boundEpoch = Epoch 0
, _boundSlot = Slot 0
, _boundTime = 0s
}
, _networkEraEnd = NetworkEraBound
{ _boundEpoch = Epoch 0
, _boundSlot = Slot 0
, _boundTime = 0s
}
, _networkEraParameters = NetworkEraParameters
{ _parametersEpochLength = EpochLength 86400
, _parametersSlotLength = 1s
, _parametersSafeZone = 25920
}
}
-
After removing them I can parse
EraHistory
with success but the question is how to filter out values from blockfrost? Which are valid eras? -
I'll try filtering all eras that start and end with slot 0
-
This worked - I reported what I found to the blockfrost guys
-
Now it is time to move forward and test if the wallet queries actually work
-
I picked one
DirectChainTest
and decided to alter it so it runs on preview usingwithCardanoNodeOnKnownNetwork
but I get
test/Test/DirectChainSpec.hs:124:3:
1) Test.DirectChain can init and abort a 2-parties head after one party has committed
uncaught exception: QueryException
QueryProtocolParamsEncodingFailureOnEra (AnyCardanoEra AlonzoEra) "Error in $: key \"poolVotingThresholds\" not found"
-
It seems like re-mapping the protocol params from blockfrost fails on
poolVotingThresholds
. -
This happens immediately when cardano-node reports
MsgSocketIsReady
cardano-node --version
cardano-node 10.1.4 - linux-x86_64 - ghc-8.10
git rev 1f63dbf2ab39e0b32bf6901dc203866d3e37de08
- I can see that this field exists in the
conway-genesis.json
in the tmp folder of a test run
-
After PR review comments from FT I wanted to add one suggestion and that is to see the Head closed and finalized after initially committing and then decommitting some UTxO.
-
This leads to
H28
error on close and this means we tried to close with initial snapshot but in fact we already got the confirmed snapshot. -
When inspecting the logs I found out that the node, after a restart, does not observe any
SnapshotConfirmed
and therefore tries to close with initial one which fails. -
Question is: Why did the restarted node failed to re-observe confirmed snapshot event?
-
Added some test code to wait and see
SnapshotConfirmed
in the restarted node to confirm it actually sees this event happening and the test fails exactly at this point. -
When both nodes are running I can view the snapshot confirmed message is there but after a restart - node fails to see
SnapshotConfirmed
message again. -
In the logs for both node 1 and 2 before restart I see two
SnapshotConfirmed
messages but in the restarted node these events are gone. -
I realized the close works if I close from node that was not restarted but what I want to do is wait for the restarted node to catch up and then close.
-
I removed fiddling with the recover and wanted to get this basic test working but closing with restarted node, even after re-observing the last decommit, fails with
H28
FailedCloseInitial
. -
This means the restarted node tried to close with the initial snapshot but one of the values doesn't match. We expect the version to be 0, snapshot number to be 0 and utxo hash should match the initial one.
-
last-known-revision for both nodes before I shutdown one of them is 11 but the restarted node, after removing the last-known-revision file ends up having value 13. How come it received more messages?
-
When comparing the state files I see discrepancies in eventId and the restarted node has a
DecommitRecorded
as the last event (other than ticks) -
Regular node decommit recorded:
{"eventId":44,"stateChanged":{"decommitTx":{"cborHex":"84a300d9010281825820ad7458781dc19e427fca77c8c7b2db1b56c81c11590e2ae3999f2f13db8c51c200018182581d60f8a68cd18e59a6ace848155a0e967af64f4d00cf8acee8adc95a6b0d1a004c4b400200a100d9010281825820eb94e8236e2099357fa499bfbc415968691573f25ec77435b7949f5fdfaa5da0584071b6c5956083ff7ac7ad49d5a75c77967b5ad2e7fd756c1de226f71cdf89e5d383bc88975c9ca7deab135f4ea9014666aa0e257f26bdd94dda2df60c922e9306f5f6","description":"","txId":"3095040e42ed9b193f8a66699b1631c17a85f670aee3c4d77fb3cfb195ea6bcb","type":"Tx ConwayEra"},"headId":"654b2b0e5ff3e0a902a12918b63628cdd478364caa4f0c758e6f7490","newLocalUTxO":{},"tag":"DecommitRecorded","utxoToDecommit":{"3095040e42ed9b193f8a66699b1631c17a85f670aee3c4d77fb3cfb195ea6bcb#0":{"address":"addr_test1vru2drx33ev6dt8gfq245r5k0tmy7ngqe79va69de9dxkrg09c7d3","datum":null,"datumhash":null,"inlineDatum":null,"inlineDatumRaw":null,"referenceScript":null,"value":{"lovelace":5000000}}}},"time":"2025-04-10T07:30:58.882632162Z"}
- Restarted node decommit recorded
{"eventId":76,"stateChanged":{"decommitTx":{"cborHex":"84a300d9010281825820ad7458781dc19e427fca77c8c7b2db1b56c81c11590e2ae3999f2f13db8c51c200018182581d60f8a68cd18e59a6ace848155a0e967af64f4d00cf8acee8adc95a6b0d1a004c4b400200a100d9010281825820eb94e8236e2099357fa499bfbc415968691573f25ec77435b7949f5fdfaa5da0584071b6c5956083ff7ac7ad49d5a75c77967b5ad2e7fd756c1de226f71cdf89e5d383bc88975c9ca7deab135f4ea9014666aa0e257f26bdd94dda2df60c922e9306f5f6","description":"","txId":"3095040e42ed9b193f8a66699b1631c17a85f670aee3c4d77fb3cfb195ea6bcb","type":"Tx ConwayEra"},"headId":"654b2b0e5ff3e0a902a12918b63628cdd478364caa4f0c758e6f7490","newLocalUTxO":{},"tag":"DecommitRecorded","utxoToDecommit":{"3095040e42ed9b193f8a66699b1631c17a85f670aee3c4d77fb3cfb195ea6bcb#0":{"address":"addr_test1vru2drx33ev6dt8gfq245r5k0tmy7ngqe79va69de9dxkrg09c7d3","datum":null,"datumhash":null,"inlineDatum":null,"inlineDatumRaw":null,"referenceScript":null,"value":{"lovelace":5000000}}}},"time":"2025-04-10T07:31:02.301798566Z"}
-
Let's try to see the decommit timeline between two states (I am aware these event's do not need to be in order but I think etcd should deliver in order after restart)
-
So let's track this decommit between two nodes
DecommitRecorded
running node 2025-04-10T07:30:58.882632162Z
restarted node 2025-04-10T07:31:02.301798566Z
DecommitApproved
running node 2025-04-10T07:30:58.894604418Z
restarted node missing event
DecommitFinalized
running node 2025-04-10T07:30:59.007515339Z
restarted node 2025-04-10T07:31:02.300503374Z
- So it seems like the restarted node is late couple of seconds but how can it
be that in the test we wait to see
DecommitFinalized
and if we try to close after the restarted node still thinks it is at version 0?
-
Trying out
dingo
and whether I could hook it up tohydra-node
-
When synchronizing
preview
withdingo
the memory footprint was growing as sync progressed, but did not increase to same level when restarting the chain sync (althoug it picked up the starting slot etc.) -
The system was swapping a lot of memory too (probably reached max of my 32GB)
-
Querying address of latest hydra head address shows two heads on
preview
, but our explorer only shows one? -
Querying the
dingo
node seems to work, but I get a hydra scripts discovery error?MissingScript {scriptName = "\957Initial", scriptHash = "c8a101a5c8ac4816b0dceb59ce31fc2258e387de828f02961d2f2045", discoveredScripts = fromList ["0e35115a2c7c13c68ecd8d74e4987c04d4539e337643be20bb3274bd"]}
-
Indeed
dingo
behaves slightly different on thequeryUTxOByTxIn
local state query: when requesting three txins, it only responds with one utxo[ TxIn "b7b88533de303beefae2d8bb93fe1a1cd5e4fa3c4439c8198c83addfe79ecbdc" ( TxIx 0 ) , TxIn "da1cc0eef366031e96323b6620f57bc166cf743c74ce76b6c3a02c8f634a7d20" ( TxIx 0 ) , TxIn "6665f1dfdf9b9eb72a0dd6bb73e9e15567e188132b011e7cf6914c39907ac484" ( TxIx 0 ) ] returned utxo: 1
-
After fixing that to query three times, the next stop gap seems to come from chain sync:
bearer closed: "<socket: 23> closed when reading data, waiting on next header True"
-
Maybe something on the n2c handshake does not work? On dingo side I see:
{"time":"2025-04-05T13:47:05.495636842+02:00","level":"INFO","msg":"listener: accepted connection from unix@629","component":"connmanager"} {"time":"2025-04-05T13:47:05.4957064+02:00","level":"ERROR","msg":"listener: failed to setup connection: could not register protocol with muxer","component":"connmanager"}
-
When debugging how far we get on the handshake protocol I learn how
gouroboros
implements the state transitions of the miniprotocols usingStateMap
. -
I realize that now the query for scripts not even works.. maybe my instrumentation broke something? Also.. all my instrumentation happened on vendored code in
vendor/
of the dingo repo. I wonder how developers do the editing most convenient in this setup? -
The
Chain.Direct
switch toconnectToLocalNodeWithVersions
was problematic, now it fetches the scripts correctly and the chain sync starts -
It's definitely flaky in how "far" we get.. maybe the
dingo
node is only accepting n2c connections while connected upstream on n2n (I have been in a train with flaky connection). -
Once it progressed now onto a
RollForward
where thequeryTimeHandle
would query theEraHistory
and fail time conversion with error:TimeConversionException {slotNo = SlotNo 77202345, reason = "PastHorizon {pastHorizonCallStack = [(\"runQuery\",SrcLoc {srcLocPackage = \"ouroboros-consensus-0.22.0.0-f90d7bc7c4431d706016c293a932800b9c1e28c3b268597acc5b945a9be83125\", srcLocModule = \"Ouroboros.Consensus.HardFork.History.Qry\", srcLocFile = \"src/ouroboros-consensus/Ouroboros/Consensus/HardFork/History/Qry.hs\", srcLocStartLine = 439, srcLocStartCol = 44, srcLocEndLine = 439, srcLocEndCol = 52}),(\"interpretQuery\",SrcLoc {srcLocPackage = \"hydra-node-0.21.0-inplace\", srcLocModule = \"Hydra.Chain.Direct.TimeHandle\", srcLocFile = \"src/Hydra/Chain/Direct/TimeHandle.hs\", srcLocStartLine = 91, srcLocStartCol = 10, srcLocEndLine = 91, srcLocEndCol = 24}),(\"slotToUTCTime\",SrcLoc {srcLocPackage = \"hydra-node-0.21.0-inplace\", srcLocModule = \"Hydra.Chain.Direct.TimeHandle\", srcLocFile = \"src/Hydra/Chain/Direct/TimeHandle.hs\", srcLocStartLine = 86, srcLocStartCol = 7, srcLocEndLine = 86, srcLocEndCol = 20}),(\"mkTimeHandle\",SrcLoc {srcLocPackage = \"hydra-node-0.21.0-inplace\", srcLocModule = \"Hydra.Chain.Direct.TimeHandle\", srcLocFile = \"src/Hydra/Chain/Direct/TimeHandle.hs\", srcLocStartLine = 116, srcLocStartCol = 10, srcLocEndLine = 116, srcLocEndCol = 22})], pastHorizonExpression = Some (EPair (ERelToAbsTime (ERelSlotToTime (EAbsToRelSlot (ELit (SlotNo 77202345))))) (ESlotLength (ELit (SlotNo 77202345)))), pastHorizonSummary = [EraSummary {eraStart = Bound {boundTime = RelativeTime 0s, boundSlot = SlotNo 0, boundEpoch = EpochNo 0}, eraEnd = EraEnd (Bound {boundTime = RelativeTime 0s, boundSlot = SlotNo 0, boundEpoch = EpochNo 0}), eraParams = EraParams {eraEpochSize = EpochSize 4320, eraSlotLength = SlotLength 20s, eraSafeZone = StandardSafeZone 0, eraGenesisWin = GenesisWindow {unGenesisWindow = 0}}},EraSummary {eraStart = Bound {boundTime = RelativeTime 0s, boundSlot = SlotNo 0, boundEpoch = EpochNo 0}, eraEnd = EraEnd (Bound {boundTime = RelativeTime 0s, boundSlot = SlotNo 0, boundEpoch = EpochNo 0}), eraParams = EraParams {eraEpochSize = EpochSize 86400, eraSlotLength = SlotLength 1s, eraSafeZone = StandardSafeZone 0, eraGenesisWin = GenesisWindow {unGenesisWindow = 0}}},EraSummary {eraStart = Bound {boundTime = RelativeTime 0s, boundSlot = SlotNo 0, boundEpoch = EpochNo 0}, eraEnd = EraEnd (Bound {boundTime = RelativeTime 0s, boundSlot = SlotNo 0, boundEpoch = EpochNo 0}), eraParams = EraParams {eraEpochSize = EpochSize 86400, eraSlotLength = SlotLength 1s, eraSafeZone = StandardSafeZone 0, eraGenesisWin = GenesisWindow {unGenesisWindow = 0}}},EraSummary {eraStart = Bound {boundTime = RelativeTime 0s, boundSlot = SlotNo 0, boundEpoch = EpochNo 0}, eraEnd = EraEnd (Bound {boundTime = RelativeTime 0s, boundSlot = SlotNo 0, boundEpoch = EpochNo 0}), eraParams = EraParams {eraEpochSize = EpochSize 86400, eraSlotLength = SlotLength 1s, eraSafeZone = StandardSafeZone 0, eraGenesisWin = GenesisWindow {unGenesisWindow = 0}}},EraSummary {eraStart = Bound {boundTime = RelativeTime 0s, boundSlot = SlotNo 172800, boundEpoch = EpochNo 2}, eraEnd = EraEnd (Bound {boundTime = RelativeTime 259200s, boundSlot = SlotNo 86400, boundEpoch = EpochNo 1}), eraParams = EraParams {eraEpochSize = EpochSize 86400, eraSlotLength = SlotLength 1s, eraSafeZone = StandardSafeZone 0, eraGenesisWin = GenesisWindow {unGenesisWindow = 0}}},EraSummary {eraStart = Bound {boundTime = RelativeTime 259200s, boundSlot = SlotNo 55728000, boundEpoch = EpochNo 645}, eraEnd = EraEnd (Bound {boundTime = RelativeTime 55814400s, boundSlot = SlotNo 345600, boundEpoch = EpochNo 4}), eraParams = EraParams {eraEpochSize = EpochSize 86400, eraSlotLength = SlotLength 1s, eraSafeZone = StandardSafeZone 0, eraGenesisWin = GenesisWindow {unGenesisWindow = 0}}},EraSummary {eraStart = Bound {boundTime = RelativeTime 55814400s, boundSlot = SlotNo 77155200, boundEpoch = EpochNo 893}, eraEnd = EraEnd (Bound {boundTime = RelativeTime 77241600s, boundSlot = SlotNo 55900800, boundEpoch = EpochNo 647}), eraParams = EraParams {eraEpochSize = EpochSize 86400, eraSlotLength = SlotLength 1s, eraSafeZone = StandardSafeZone 0, eraGenesisWin = GenesisWindow {unGenesisWindow = 0}}}]}"}
-
I saw that same error when using
cardano-cli query tip
.. seems like the era history local state query is not accurately reporting epoch bounds. -
I conclude that
dingo
is easy to use and navigate around, but the N2C API is not complete yet. Maybe my work on the LocalStateQuery API incardano-blueprint
could benefit the project and makinggouroboros
more conformant (at least from a message serialization point of view).
-
Non-profiled Haskell binaries can be inspected using
-s
and-hT
RTS arguments -
Running the
hydra-node
using a 2GB state file as provided by GD the node will load the state and then fail on mismatched keys (as we have not the right ones):151,712,666,608 bytes allocated in the heap 14,411,335,656 bytes copied during GC 973,747,296 bytes maximum residency (53 sample(s)) 24,460,192 bytes maximum slop 2033 MiB total memory in use (0 MiB lost due to fragmentation)
-
The
peekForeverE
in https://github.com/cardano-scaling/hydra/pull/1919 seem not to make any difference:151,712,692,632 bytes allocated in the heap 14,409,258,352 bytes copied during GC 973,732,032 bytes maximum residency (53 sample(s)) 24,545,088 bytes maximum slop 2033 MiB total memory in use (0 MiB lost due to fragmentation)
-
Using
hT
a linear growth of memory can be seen quite easily. -
First idea:
lastEventId
conduit was usingfoldMapC
which might be building thunks viamappend
- Nope, that was not the issue.
-
That was not the issue.. disabling aggregation of
chainStateHistory
and only loadheadState
next.- Still linear growth.. so the culprit most likely is inside the main loading of
headState
(besides other issues?)
- Still linear growth.. so the culprit most likely is inside the main loading of
-
Let's turn on
StrictData
on all ofHeadLogic
as a first stab at seeing more stricture usage ofHeadState
et al. -
This works! Making
HeadLogic{.State, .Outcome}
allStrictData
already pushes the heap usage down ~5MB! -
Possible explanation: With gigabytes of state updates we have almost exclusively
TransactionReceived
et al state changes. In theaggregate
we usually build up thunks likeallTxs = allTxs <> fromList [(txId tx, tx)]
which will leak memory until forced into one concrete list when showing theHeadState
first (which will probably collapse the memory usage again). -
With
StrictData
we have a maximum residency of 10MB after loading 2GB of state events:152,176,815,256 bytes allocated in the heap 16,702,572,088 bytes copied during GC 9,967,848 bytes maximum residency (2387 sample(s)) 215,600 bytes maximum slop 43 MiB total memory in use (0 MiB lost due to fragmentation)
-
Trying to narrow in exact source of memory leak so I do not need to put bangs everywhere
-
allTxs
andlocalTxs
assignments are not the source of it .. maybe thecoordinatedHeadState
record update? -
No .. also not really. Maybe it's time to recompile with
profiling
enabled and make some coffee (this will take a while). -
When using
profiling: True
using thehaskell.nix
managed dependencies, I ran into this error: -
Setting
enableProfiling = true
in the haskell.nix projectmodules
rebuilds the whole world, but that is expected. -
Hard to spot where exactly we are creating the space leak / thunks. This blog post is helpful still: http://blog.ezyang.com/2011/06/pinpointing-space-leaks-in-big-programs/
-
I am a bit confused why so many of the cost center point to parsing and decoding code .. maybe the transactions themselves (which make up the majority of data) are not forced for long? This would make sense because the
HeadLogic
does not inspect transactions themselves (much). -
Only strictness annotations on a
!tx
did not help, but let's try aStrictData
onStateChanged
-
StrictData
onHeadLogic.Outcome
does not fix it … so it must be something related to theHeadState
. -
The retainer profile actually points quite clearly to
aggregate
. -
The biggest things on the heap are bytes, thunks and types related to a cardano transaction body.
-
-
Going back to zero in on branches of
aggregate
via exclusion- Disabling all
CoordinatedHeadState
modifications makes memory usage minimal again - Enabling
SnapshotConfirmed
-> still bounded - Enabling
PartySignedSnapshot
-> still bounded - Enabling
SnapshotRequested
-> growing! - Without
allTxs
update -> bounded! - This line creates thunks!?
allTxs = foldr Map.delete allTxs requestedTxIds
- Neither, forcing
allTxs
norrequestedTxIds
helped - Is it really only this line? enableing all other
aggregate
updates toCoordinatedHeadState
- It's both
allTxs
usages - If we only make
allTxs
field strict? -> Bounded!
- Disabling all
-
After easy changes to
FanoutTx
to include observed UTxO instead of using the confirmed snapshot there are problems in theDirectChainSpec
andModel
. -
Let's look at
DirectChainSpec
first - I need to come up with a utxo value for this line here:
aliceChain `observesInTime` OnFanoutTx headId mempty
- Failed test looks like this:
test/Test/DirectChainSpec.hs:578:35:
1) Test.DirectChain can open, close & fanout a Head
expected: OnFanoutTx {headId = UnsafeHeadId "eK+\SO_\243\224\169\STX\161)\CAN\182\&6(\205\212x6L\170O\fu\142ot\144", fanoutUTxO = fromList [(TxIn "0762c8de902abe1e292e691066328c932d95e29c9a564d466e8bc791527e359f" (TxIx 0),TxOut (AddressInEra (ShelleyAddressInEra ShelleyBasedEraConway) (ShelleyAddress Testnet (KeyHashObj (KeyHash {unKeyHash = "8163bc1d679f90d073784efdc761288dbc2dc21a352f69238070fc45"})) StakeRefNull)) (TxOutValueShelleyBased ShelleyBasedEraConway (MaryValue (Coin 2000000) (MultiAsset (fromList [])))) TxOutDatumNone ReferenceScriptNone),(TxIn "c9a733c945fdb7819648a58d7d6b9a30af2ac458a27f5bb7e9c41f92da82ba2c" (TxIx 0),TxOut (AddressInEra (ShelleyAddressInEra ShelleyBasedEraConway) (ShelleyAddress Testnet (KeyHashObj (KeyHash {unKeyHash = "8163bc1d679f90d073784efdc761288dbc2dc21a352f69238070fc45"})) StakeRefNull)) (TxOutValueShelleyBased ShelleyBasedEraConway (MaryValue (Coin 2000000) (MultiAsset (fromList [])))) TxOutDatumNone ReferenceScriptNone)]}
but got: OnFanoutTx {headId = UnsafeHeadId "eK+\SO_\243\224\169\STX\161)\CAN\182\&6(\205\212x6L\170O\fu\142ot\144", fanoutUTxO = fromList [(TxIn "880c3d807a48d432788158f879a81a5ddc6c1ad6527fe70922175e621ea08092" (TxIx 0),TxOut (AddressInEra (ShelleyAddressInEra ShelleyBasedEraConway) (ShelleyAddress Testnet (ScriptHashObj (ScriptHash "0e35115a2c7c13c68ecd8d74e4987c04d4539e337643be20bb3274bd")) StakeRefNull)) (TxOutValueShelleyBased ShelleyBasedEraConway (MaryValue (Coin 4879080) (MultiAsset (fromList [(PolicyID {policyID = ScriptHash "654b2b0e5ff3e0a902a12918b63628cdd478364caa4f0c758e6f7490"},fromList [("4879647261486561645631",1),("f8a68cd18e59a6ace848155a0e967af64f4d00cf8acee8adc95a6b0d",1)])])))) (TxOutDatumInline BabbageEraOnwardsConway (HashableScriptData "\216{\159\216y\159X\FSeK+\SO_\243\224\169\STX\161)\CAN\182\&6(\205\212x6L\170O\fu\142ot\144\159X \213\191J?\204\231\ETB\176\&8\139\204'I\235\193H\173\153i\178?E\238\ESC`_\213\135xWj\196\255\216y\159\EM'\DLE\255\NUL\SOHX \193\211\DC4E\234\252\152\157\239\186\RSmVF\141\208\218\135\141\160{\fYFq\245\SOH\148\nOS\DC1X \227\176\196B\152\252\FS\DC4\154\251\244\200\153o\185$'\174A\228d\155\147L\164\149\153\ESCxR\184UX \227\176\196B\152\252\FS\DC4\154\251\244\200\153o\185$'\174A\228d\155\147L\164\149\153\ESCxR\184U\128\ESC\NUL\NUL\SOH\149\214\218\152\136\255\255" (ScriptDataConstructor 2 [ScriptDataConstructor 0 [ScriptDataBytes "eK+\SO_\243\224\169\STX\161)\CAN\182\&6(\205\212x6L\170O\fu\142ot\144",ScriptDataList [ScriptDataBytes "\213\191J?\204\231\ETB\176\&8\139\204'I\235\193H\173\153i\178?E\238\ESC`_\213\135xWj\196"],ScriptDataConstructor 0 [ScriptDataNumber 10000],ScriptDataNumber 0,ScriptDataNumber 1,ScriptDataBytes "\193\211\DC4E\234\252\152\157\239\186\RSmVF\141\208\218\135\141\160{\fYFq\245\SOH\148\nOS\DC1",ScriptDataBytes "\227\176\196B\152\252\FS\DC4\154\251\244\200\153o\185$'\174A\228d\155\147L\164\149\153\ESCxR\184U",ScriptDataBytes "\227\176\196B\152\252\FS\DC4\154\251\244\200\153o\185$'\174A\228d\155\147L\164\149\153\ESCxR\184U",ScriptDataList [],ScriptDataNumber 1743066405000]]))) ReferenceScriptNone)]}
-
So it seems like there is a script output in the observed UTxO with 4879080 lovelace, some tokens and seems like this is a head output and what we expect is distributed outputs to hydra-node parties containing the fanout amount.
-
These head assets that I see should have been burned already? We get this utxo in the observation using
let inputUTxO = resolveInputsUTxO utxo tx
-
If I use
(headInput, headOutput) <- findTxOutByScript inputUTxO Head.validatorScript
UTxO.singleton (headInput, headOutput)
then the utxo is the same which is expected.
-
How come the fanout tx does not contain pub key outputs?
-
If I use
utxoFromTx fanoutTx
then I get the expected pub key outputs:
1) Test.DirectChain can open, close & fanout a Head
expected: OnFanoutTx {headId = UnsafeHeadId "eK+\SO_\243\224\169\STX\161)\CAN\182\&6(\205\212x6L\170O\fu\142ot\144", fanoutUTxO = fromList []}
but got: OnFanoutTx {headId = UnsafeHeadId "eK+\SO_\243\224\169\STX\161)\CAN\182\&6(\205\212x6L\170O\fu\142ot\144", fanoutUTxO = fromList [(TxIn "431e45c0048e0aa104deaca1e8aca454c85efd71c52948e418d9119fd8cdf7b3" (TxIx 0),TxOut (AddressInEra (ShelleyAddressInEra ShelleyBasedEraConway) (ShelleyAddress Testnet (KeyHashObj (KeyHash {unKeyHash = "4e932840c5d2d3664237149fd3e9ba09c531581126fbdbab073c31ce"})) StakeRefNull)) (TxOutValueShelleyBased ShelleyBasedEraConway (MaryValue (Coin 2000000) (MultiAsset (fromList [])))) TxOutDatumNone ReferenceScriptNone),(TxIn "431e45c0048e0aa104deaca1e8aca454c85efd71c52948e418d9119fd8cdf7b3" (TxIx 1),TxOut (AddressInEra (ShelleyAddressInEra ShelleyBasedEraConway) (ShelleyAddress Testnet (KeyHashObj (KeyHash {unKeyHash = "f8a68cd18e59a6ace848155a0e967af64f4d00cf8acee8adc95a6b0d"})) StakeRefNull)) (TxOutValueShelleyBased ShelleyBasedEraConway (MaryValue (Coin 90165992) (MultiAsset (fromList [])))) TxOutDatumNone ReferenceScriptNone)]}
but the overall test is red since we construct artificial TxIns in utxoFromTx
-
I created
findPubKeyOutputs
to match on all pub key outputs and then I see expected outputs but they also contain change output that returns some ada to the hydra-node wallet. Life is not simple. -
In the end I changed all tests that match exactly on final utxo to make sure that subset of final utxo is there (disregarding the change output).
-
Changes in fanout observation boiled down to
findPubKeyOutputs $ utxoFromTx tx
-
Midnight people have reported that they still see some memory issues when loading a huge state file from disk.
-
The main problem is making sure the fix works, I still don't have a good idea on how to make sure my changes reduce the memory consumption.
-
Problem lies in this piece of code:
(lastEventId, (headState, chainStateHistory)) <-
runConduitRes $
sourceEvents eventSource
.| getZipSink
( (,)
<$> ZipSink (foldMapC (Last . pure . getEventId))
<*> ZipSink recoverHeadStateC
)
...
recoverHeadStateC =
mapC stateChanged
.| getZipSink
( (,)
<$> ZipSink (foldlC aggregate initialState)
<*> ZipSink (foldlC aggregateChainStateHistory $ initHistory initialChainState)
)
and of course the way we create PersistenceIncremental
which is responsible
for reading the file (sourceEvents eventSource
part).
sourceFileBS fp
.| linesUnboundedAsciiC
.| mapMC
( \bs ->
case Aeson.eitherDecodeStrict' bs of
Left e -> ...
Right decoded -> ...
)
-
Initially I noticed the usage of
foldlC
which is strict and thought perhaps this is the problem but could not find lazy alternative and in general I don't believe this is the real issue. -
I am more keen to investigate this code:
sourceFileBS fp
.| linesUnboundedAsciiC
.| mapMC ...
-
linesUnboundedAsciiC
could be the cause since I believe it is converting the whole stream
Convert a stream of arbitrarily-chunked textual data into a stream of data
where each chunk represents a single line. Note that, if you have
unknownuntrusted input, this function is unsafe/, since it would allow an
attacker to form lines of massive length and exhaust memory.
-
I also found an interesting function
peekForeverE
that should Run a consuming conduit repeatedly, only stopping when there is no more data available from upstream. -
Could I use benchmarks to simulate heavy load from disk?
-
I just tried running the benchmarks with and without one line change and it seems like the memory consumption is reduced
- BEFORE ->
Average confirmation time (ms): 57.599974154
P99: 76.48237684999998ms
P95: 67.55752405ms
P50: 56.9354805ms
Invalid txs: 0
### Memory data
| Time | Used | Free |
|------|------|------|
| 2025-03-27 15:39:59.474482067 UTC | 14.2G | 35.1G |
| 2025-03-27 15:40:04.474412824 UTC | 14.4G | 34.9G |
| 2025-03-27 15:40:09.474406479 UTC | 14.4G | 34.9G |
| 2025-03-27 15:40:14.474403701 UTC | 14.4G | 34.8G |
| 2025-03-27 15:40:19.47445777 UTC | 14.4G | 34.8G |
| 2025-03-27 15:40:24.474392458 UTC | 14.4G | 34.8G |
| 2025-03-27 15:40:29.474439923 UTC | 14.4G | 34.8G |
| 2025-03-27 15:40:34.474408859 UTC | 14.5G | 34.7G |
| 2025-03-27 15:40:39.474436556 UTC | 14.4G | 34.7G |
| 2025-03-27 15:40:44.474414945 UTC | 14.5G | 34.7G |
Confirmed txs/Total expected txs: 300/300 (100.00 %)
Average confirmation time (ms): 9.364033643
P99: 19.919154109999997ms
P95: 15.478096ms
P50: 7.7630015ms
Invalid txs: 0
### Memory data
| Time | Used | Free |
|------|------|------|
| 2025-03-27 15:40:55.995225272 UTC | 14.2G | 35.1G |
| 2025-03-27 15:41:00.995294779 UTC | 14.2G | 35.1G |
| 2025-03-27 15:41:05.995309124 UTC | 14.2G | 35.1G |
| 2025-03-27 15:41:10.995299687 UTC | 14.3G | 35.0G |
| 2025-03-27 15:41:15.995284362 UTC | 14.3G | 35.0G |
| 2025-03-27 15:41:20.995281122 UTC | 14.3G | 35.0G |
- AFTER ->
Average confirmation time (ms): 57.095020378
P99: 72.8903286ms
P95: 66.89188805ms
P50: 57.172249ms
Invalid txs: 0
### Memory data
| Time | Used | Free |
|------|------|------|
| 2025-03-27 15:37:47.726878831 UTC | 13.7G | 35.6G |
| 2025-03-27 15:37:52.726824668 UTC | 13.9G | 35.5G |
| 2025-03-27 15:37:57.726768654 UTC | 14.0G | 35.3G |
| 2025-03-27 15:38:02.72675874 UTC | 14.0G | 35.3G |
| 2025-03-27 15:38:07.726756126 UTC | 14.0G | 35.3G |
| 2025-03-27 15:38:12.726795633 UTC | 14.0G | 35.2G |
| 2025-03-27 15:38:17.726793141 UTC | 14.1G | 35.2G |
| 2025-03-27 15:38:22.726757309 UTC | 14.1G | 35.1G |
| 2025-03-27 15:38:27.726764279 UTC | 14.1G | 35.1G |
| 2025-03-27 15:38:32.726781991 UTC | 14.1G | 35.1G |
Confirmed txs/Total expected txs: 300/300 (100.00 %)
Average confirmation time (ms): 9.418157436
P99: 19.8506584ms
P95: 15.841609050000002ms
P50: 7.821248000000001ms
Invalid txs: 0
### Memory data
| Time | Used | Free |
|------|------|------|
| 2025-03-27 15:38:45.195815881 UTC | 13.8G | 35.6G |
| 2025-03-27 15:38:50.195894922 UTC | 14.0G | 35.4G |
| 2025-03-27 15:38:55.19592388 UTC | 13.8G | 35.5G |
| 2025-03-27 15:39:00.195971592 UTC | 14.1G | 35.2G |
| 2025-03-27 15:39:05.195891924 UTC | 14.3G | 35.0G |
| 2025-03-27 15:39:10.195897911 UTC | 14.3G | 35.0G |
-
I think I could try to keep the state file after running the benchmarks and then try to start a hydra-node using this (hopefully huge) state file and then peek into prometheus metrics to observe reduced memory usage.
-
What I find weird is that the same persistence functions are used in the api server but there is no report leakage there - perhaps it boils down on how we consume this stream?
-
Managed to get a state file with over 300k events so let's see if we can measure reduced usage.
-
This is my invocation of hydra-node so I can copy paste it when needed:
./result/bin/hydra-node \
--node-id 1 --api-host 127.0.0.1 \
--monitoring-port 6000 \
--hydra-signing-key /home/v0d1ch/code/hydra/memory/state-0/me.sk \
--hydra-scripts-tx-id "8f46dbf87bd7eb849c62241335fb83b27e9b618ea4d341ffc1b2ad291c2ad416,25f236fa65036617306a0aaf0572ddc1568cee0bc14aee14238b1196243ecddd,59a236ac22eb1aa273c4bcd7849d43baddd8fcbc5c5052f2eb074cdccbe39ff4" \
--cardano-signing-key /home/v0d1ch/code/hydra/memory/1.sk \
--ledger-protocol-parameters /home/v0d1ch/code/hydra/memory/state-0/protocol-parameters.json \
--testnet-magic 42 \
--contestation-period 10 \
--deposit-deadline 10 \
--node-socket /home/v0d1ch/code/hydra/memory/node.socket \
--persistence-dir /home/v0d1ch/code/hydra/memory/state-0
- Didn't find the time to properly connect to some tool to measure the memory
but by looking at the timestamps between
LoadindState
andLoadedState
traces I can see that new changes give MUCH better performance:
With the current master:
start loading timestamp":"2025-03-27T17:12:08.57862623Z
loaded timestamp":"2025-03-27T17:12:28.991870713Z
With one-liner change:
start loading timestamp":"2025-03-27T16:58:54.055623085Z
loaded timestamp":"2025-03-27T16:59:15.05648201Z
- It looks like it took us 20 seconds to load around 335 mb state file and new change reduces this to around a second!
-
It seems like we have a bug in displaying our pending deposits where deposits that are already incremented or recovered still show when doing a request to hydra-node api
/commits
. -
I extended one e2e test we had related to pending deposits and added one check after all others where I spin up again two hydra-nodes and call the endpoint to see if all pending deposits are cleared.
✦ ➜ cabal test hydra-cluster --test-options='--match="can see pending deposits" --seed 278123554'
-
The test seems flaky but in general it almost always fails.
-
From just looking at the code I couldn't see anything weird
-
Found one weird thing: I asserted that in the node-1 state file there are three
CommitRecorded
and threeCommitRecovered
but in node-2 state file there are twoCommitRecovered
missing. -
Is the whole bug related to who does the recovering/recording?
-
The test outcome although red shows correct txids for the non-recovered txs
-
We only assert node-1 sees all
CommitRecover
messages but don't do it for the node-2 since that node is shut down at this point (in order to be able to prevent deposits from kicking in). -
Is this a non-issue after all? I think so since we stop one node and then try to assert that after restart it sees some other commits being recovered but those were never recorded in the node local state. What is weird is that the test was flaky but using constant seed yields always the same results.
-
If a node fails to see
OnIncrementTx
then the deposit is stuck in pending local state forever.
-
Currently I had to sprinkle
treadDelay
here and there in the model tests since otherwise they hang for a long time and eventually (I think) report the shrinked values that fail the test. -
This problem is visible mainly in CI where the resources available are not so big, locally the same tests pass.
-
If I remove the threadDelay the memory grows really big and I need to kill the process.
-
This started happening when I had to replace
GetUTxO
which no longer exists withqueryState
-
I looked at this with NS and found out that we were missing to wait for all nodes to see a
DecommitFinalized
- we were waiting only on our node to see it. This seemed to have fixed the model test and was a bit surprising to be this easy since I expected a lot of problems in finding out what went wrong.
-
The situation is that we are unable to close because of
H13
MustNotChangeVersion
-
This happens because the version in the input datum (open datum) does not match with the version in the output (close datum).
-
Local state says I am on version 3 and onchain it seems the situation is the same - 3! But this can't be since the onchain check would pass then. This is how the datum looks https://preview.cexplorer.io/datum/8e4bd7ac38838098fbf23e5702653df2624bcfa4cf0c5236498deeede1fdca78
-
Looking at the state it seems like we try to close, the snapshot version contains correct version (3) but
openVersion
is still at 2:
...
"utxoToCommit": null,
"utxoToDecommit": null,
"version": 3
},
"tag": "ConfirmedSnapshot"
},
"headId": "50bb0874ae28515a2cff9c074916ffe05500a3b4eddea4178d1bed0b",
"headParameters": {
"contestationPeriod": 300,
"parties": [
...
"openVersion": 2,
"tag": "CloseTx"
},
"tag": "OnChainEffect"
}
-
Question is how did we get to this place? It must be that my node didn't observe and emit one
CommmitFinalized
which is when we do the version update - upon increment observation. -
There are 24 lines with
CommitFinalize
message - only go up to version 2 while there are 36 lines withCommitRecorded
- it seems like one recorded commit was not finalized for whatever reason. -
OnIncrementTx
shows up 8 times in the logs but in reality it is tied to only two increments so the third one was never observed. -
OnDepositTx
shows up 12 times in the logs but they are related to only two deposits. -
Could it be that the decommit failed instead?
-
There is one
DecommitRecorded
and oneDecommitFinalized
so it seems good. -
Seems like we have
CommitRecorded
for:"utxoToCommit":{"4b31dd7db92bde4359868911c1680ea28c0a38287a4e5b9f3c07086eca1ac26a#0"
"utxoToCommit":{"4b31dd7db92bde4359868911c1680ea28c0a38287a4e5b9f3c07086eca1ac26a#1"
"utxoToCommit":{"22cb19c790cd09391adf2a68541eb00638b8011593b3867206d2a12a97f4bf0d#0"
-
We received
CommitFinalized
for: -
"theDeposit":"44fa1bc9b04d2ffee50fd84088517c3f7b530353834e7c678fdd05073881cb40"
- "theDeposit":"5b93f95068148482a1e27979517e8ab467f85e72551cfc9baaa2086a60e7353a"
-
So one commit was never finalized but it is a bit hard to connect recorded and finalized commits.
-
OnDepositTx
was seen for txids: - 44fa1bc9b04d2ffee50fd84088517c3f7b530353834e7c678fdd05073881cb40 - 5b93f95068148482a1e27979517e8ab467f85e72551cfc9baaa2086a60e7353a - 83e7c36a9d4727e00169409f869d0f94737672c7e87850632b9efe1637f8ef8f -
OnIncrementTx
was seen for:- 44fa1bc9b04d2ffee50fd84088517c3f7b530353834e7c678fdd05073881cb40
- 5b93f95068148482a1e27979517e8ab467f85e72551cfc9baaa2086a60e7353a so we missed to observe deposit `83e7c36a9d4727e00169409f869d0f94737672c7e87850632b9efe1637f8ef8f https://preview.cexplorer.io/tx/83e7c36a9d4727e00169409f869d0f94737672c7e87850632b9efe1637f8ef8f#data
-
Question is what to do with this Head? Can it be closed somehow?
-
We should query the deposit address to see what kind of UTxOs are available there.
-
Added an endpoint to GET the latest confirmed snapshot, which is needed to construct the side-load request, but it does not include information about the latest seen snapshot. Waiting on pull#1860 to enhance it.
-
In our scenario, the head got stuck on InitialSnapshot. This means that during side-loading, we must act similarly to clear pending transactions (pull#1840).
-
Wonder if the side-loaded snapshot version should be exactly the same as the current one, given that version bumping requires L1 interaction.
-
Also unclear if we should validate utxoToCommit and utxoToDecommit on the provided snapshot to match the last known state.
-
Concerned that a head can become stuck during a Recover or Decommit client input.
-
SideLoadSnapshot is the first ClientInput that contains a headId and must be verified when received by the node.
-
Uncertain whether WaitOnNotApplicableTx for localTxs not present in the side-loaded confirmed snapshot would trigger automatic re-submission.
-
I think this feature should not be added to TUI since it is not part of the core protocol or user journey.
-
Now that we have a head stuck on the initial snapshot, I want to explore how we can introspect the node state from the client side, as this will be necessary to create the side-load request.
-
Projecting the latest SnapshotConfirmed seems straightforward, but projecting the latest SeenSnapshot introduces code duplication in HeadLogic.aggregate and the ServerOutput projection.
-
These projections currently conflict heavily with pull#1860. For that reason, we are postponing these changes until it is merged.
-
We need to break down withHydraNode into several pieces to allow starting a node with incorrect ledger-protocol-params in its running configuration.
-
In this e2e scenario, we exercise a three party network where two nodes (node-1 and node-2) are healthy, and one (node-3) is misconfigured. In this setup, node-1 attemtps to submit a NewTx which is accepted by both healthy members but rejected by node-3. Then, when node-3 goes offline and comes back online using healthy pparams, it is expected to stop cooperating and cause the head to become stuck.
-
It seems that after node-3 comes back online, it only sees a PeerConnected message within 20s. Adding a delay for it to catch up does not help. From its logs, we don’t see messages for WaitOnNotApplicableTx, WaitOnSeenSnapshot, or DroppedFromQueue.
-
If node-3 tries to re-submit the same transaction, it is now accepted by node-3 but rejected by node-1 and node-2 due to ValueNotConservedUTxO (because it was already applied). Since node-3 is not the leader, we don’t see any new SnapshotRequested round being signed.
-
Node-1 and node-2 have already signed and observed each other signing for snapshot number 1, while node-3 has not seen anything. This means node-1 and node-2 are waiting for node-3 to sign in order to proceed. Now the head is stuck and won’t make any progress because node-3 has stopped cooperating.
-
New issue raised for head getting stuck issue#1773, which proposes to forcibly sync the snapshots of the hydra-nodes in order to align local ledger states.
-
Updating the sequence diagram for a head getting stuck using latest findings.
-
Now thinking how could we "Allow introspection of the current snapshot in a particular node"; as we want to be able to notice if the head has become stuck. We want to be able to observed who is missing to sign the current snapshot in flight (which is preventing from getting it confirmed).
-
Noticed that in onOpenNetworkReqTx we keep TransactionReceived even if not applicable, resulting in a list with potentially dupplicate elements (in case of resubmission).
-
Given that a head becoming stuck is an L2 issue due to network connectivity, I’m considering whether we could send more information about the local ledger state as part of PeerConnected to trigger auto-sync recovery based on discrepancies. Or perhaps we should broadcast InvalidTx instead?
-
Valid idea to explore after side-load.
-
Trying to reproduce a head becoming stuck in BehaviorSpec when a node starts with an invalid ledger.
-
Having oneMonth in BehaviorSpec's waitUntilMatch makes debugging harder. Reduced it to (6 * 24 * 3), allowing full output visibility.
-
After Bob reconnects using a valid ledger, we expected him to accept the transaction if re-submitted by him, but he rejects it instead.
-
It's uncertain whether Bob is rejecting the resubmission or something else, so I need to wait until all transactions are dropped from the queue.
-
Found that when Bob is resubmitting, he is in Idle state when he is expected to restart in Initial state.
-
This is interesting, as if a party suffers a disk error and loses persistence, side-loading may allow it to resume up to a certain point in time.
-
The idea is valid, but we should not accept a side-load when in Idle state—only when in Open state.
-
It seems this is the first time we attempt to restart a node in BehaviorSpec. Now checking if this is the right place or if I should design the scenario differently.
-
When trying to restart the node from existing sources, we noticed the need to use the
hydrate
function. This suggests we should not force reproducing this scenario in BehaviorSpec. -
NodeSpec does not seem to be the right place either, as we don't have multiple peers connected to each other.
-
Trying to reproduce the scenario at the E2E level, now running on top of an etcd network.
- Continuing where I left off yesterday - to fix a single test that should throw
IncorrectAccessException
but instead I saw yesterday:
uncaught exception: IOException of type ResourceBusy
- When I sprinkle some
spy'
to see the values of actually thread ids I don't get this exception anymore, just the test fails. So the exception is tightly coupled with how we check for threads in thePersistenceIncremental
handle. - I tried labeling the threads and using
throwTo
fromMonadFork
but the result is the same. - Tried using
withBinaryFile
in bothsource
andappend
and useconduit
to stream from/to file but that didn't help. - Tried using
bracket
withopenBinaryFile
and then sink/source handle in the callback but the results are the same. - What is happening here?
-
There are only two problems left to solve here. First one being the
IncorrectAccessException
from persistence in the cluster tests. This one I have a plan on how to solve (have a way to register a thread that will append) and the other problem is some cluster tests fail since appropriate message was not observed. -
One example test is persistence can load with empty commit.
-
I wanted to verify is the messages are coming through since the test fails at
waitFor
and I see the messages propagated (but I don't seeHeadIsOpened
twice!) -
Looking at the messages the
Greetings
message does not contain correctHeadStatus
anymore! There was a projection that made sure to update this feeld inGreetings
message but now we shuffled things around and I don't think this projection works any more. -
I see all messages correct (except headStatus in
Greetings
) but only propagated once (and we do restart the node in our test). -
I see api server being spun up twice but second time I don't see message replay for some reason.
-
One funny thing is I see
ChainRollback
- perhaps something around this is broken? -
I see one rebase mistake in
Monitoring
module that I reverted. -
After some debugging I notice that the history loaded from the conduit is always empty list. This is the cause of our problems here!
-
Still digging around code to try and figure out what is happening. I see
HeadOpened
saved in persistence file and can't for the life of me figure out why it is not loaded on restart. I tried even passing in the complete intact event source conduit to make sure I am not consuming the conduit in theServer
leaving it empty for theWSServer
but this is not the problem I am having. -
I remapped all projections to work with
StateChanged
instead ofServerOutput
since it makes no sense to remap toServerOutput
just for that. -
Suspecting that
mapWhileC
is the problem since it would stop each time it can't convert someStateEvent
toServerOutput
from disk! -
This was it -
mapWhileC
stops when it encounteresNothing
so it was not processing complete list of events! So happy to fix this. -
Next is to tackle the
IncorrectAccessException
from persistence. I know why this happens (obviously we try to append from different thread) and sourcing the contents of a persistence file should not be guarded by correct thread id. In fact, we should allow all possible clients to accept (streamed) persistence contents and make sure to only append from one thread and that is the one in which hydra-node process is actually running. -
I added another field to
PersistenceIncremental
calledregisterThread
and it's sole purpose is to register a thread in which we run in - so that we are able to append (I also removed the check for thread id fromsource
and moved it toappend
) -
Ok, this was not the fix I was looking for. The registerThread is hidden in the persistence handle so if you don't have access to it from the outside how would you register a thread (for example in our tests).
-
I ended up registering a thread id on
append
if it doesn't exist and do a check if it is there but see one failure:
test/Hydra/PersistenceSpec.hs:59:5:
1) Hydra.Persistence.PersistenceIncremental it cannot load from a different thread once having started appending
uncaught exception: IOException of type ResourceBusy
/tmp/hydra-persistence-33802a411f862b7a/data: openBinaryFile: resource busy (file is locked)
(after 1 test)
[]
[String "WT",Null,Null]
I still need to investigate.
-
There is no
CommandFailed
andClientEffect
-
We don't have anymore
GetUTxO
client input therefore I had to call api usingGET /snapshot/utxo
request to obtain this information (in cluster tests) -
For the tests that don't spin the api server I used
TestHydraClient
and it'squeryState
function to obtain theHeadState
which in turn contains the head UTxO. -
One important thing to note is that I had to add
utxoToCommit
in the snapshot projection in order to get the expected UTxO. This was a bug we had and nobody noticed. -
We return
Greetings
andInvalidInput
types from the api server without wrapping them intoTimedServerOutput
which is a bit annoying since now we need to double parse json values in tests. If the decoding fails forTimedServerOutput
we try to parse just theServerOutput
.
Current problems:
-
After adding
/?history=yes
to hydra-cluster tests api client I started seeingIncorrectAccessException
from the persistence. This is weird to me since all we do is read from the persistence event sink. -
Querying the hydra node state in our Model tests to get the Head UTxO (instead of using GetUTxO client input) hangs sometimes and I don't see why. I suspect this has something to do with threads spawned in the model tests:
This is the diff, it looks benign:
waitForUTxOToSpend ::
forall m.
- (MonadTimer m, MonadDelay m) =>
+ MonadDelay m =>
UTxO ->
CardanoSigningKey ->
Value ->
TestHydraClient Tx m ->
m (Either UTxO (TxIn, TxOut CtxUTxO))
-waitForUTxOToSpend utxo key value node = go 100
+waitForUTxOToSpend utxo key value node = do
+ u <- headUTxO node
+ threadDelay 1
+ if u /= mempty
+ then case find matchPayment (UTxO.pairs u) of
+ Nothing -> pure $ Left utxo
+ Just (txIn, txOut) -> pure $ Right (txIn, txOut)
+ else pure $ Left utxo
where
- go :: Int -> m (Either UTxO (TxIn, TxOut CtxUTxO))
- go = \case
- 0 ->
- pure $ Left utxo
- n -> do
- node `send` Input.GetUTxO
- threadDelay 5
- timeout 10 (waitForNext node) >>= \case
- Just (GetUTxOResponse _ u)
- | u /= mempty ->
- maybe
- (go (n - 1))
- (pure . Right)
- (find matchPayment (UTxO.pairs u))
- _ -> go (n - 1)
-
matchPayment p@(_, txOut) =
isOwned key p && value == txOutValue txOut
Model tests sometimes succeed but this is not good enough and we don't want anymore flaky tests.
- Started by investigating
hydra-cluster
tests failing, for example this one erroring with
4) Test.EndToEnd, End-to-end on Cardano devnet, restarting nodes, close of an initial snapshot from re-initialized node is contested
Process "hydra-node (2)" exited with failure code: 1
Process stderr: RunServerException {ioException = Network.Socket.bind: resource busy (Address already in use), host = 0.0.0.0, port = 4002}
- Seems like the
hydra-node
is not shutting down cleanly and scenarios like this - Isolated test scenarios where we simply expect
withHydraNode
to start/stop and restart within a certain time and not fail - Testing these tests on master it worked fine?! Seems to have something to do with
etcd
? - When debugging
withHydraNode
and trying to port it totyped-process
, I noticed that we don't need thewithHydraNode'
variant really -> merged them - Back to the tests.. why are they failing while the
hydra-node
binary seems to behave just fine interactively? - With several
threadDelay
and prints all over the place I saw that thehydra-node
spawnsetcd
as a sub-process, but whenwithProcess
(any of its variants) results instopProcess
, theetcd
child stays alive! - Issuing a ctrl+c on
ghci
has theetcd
process log a signal detected and it shut downs - We are not sending
SIGINT
to theetcd
process? TriedinterruptProcessGroupOf
in theEtcd
module - My handlers (
finally
orbracket
) are not called!? WTF moment - Found this issue which mentions that
withProcess
sendsSIGTERM
that is not handled by default - So the solution is two-fold:
- First, we need to make sure to send
SIGINT
to theetcd
process whenever we are asked to shut down too (in theEtcd
module) - Also, we should initiate a graceful shutdown when the
hydra-node
receivesSIGTERM
- This is a better approach than making
withHydraNode
send aSIGINT
tohydra-node
- While that would work too, dealing
SIGTERM
inhydra-node
is more generally useful - For example a
docker stop
sendsSIGTERM
to the main proces in a container
- This is a better approach than making
- First, we need to make sure to send
- When starting to use
grapesy
I had a conflict ofouroboros-network
needing an oldernetwork
thangrapesy
. Made me drop the ouroboros modules first. - Turns out we still depend transitively on the
ouroboros-network
packages (viacardano-api
), but cabal resolver errors are even wors. - Adding a
allower-newer: network
still works - Is it fine to just use a newer version of
network
in theouroboros-network
? - The commits that bumped the upper bound does not indicate otherwise
- Explicitly listed all packages in
allow-newer
and moved on with life
- Working on
PeerConnected
(or an equivalent) foretcd
network. - Changing the inbound type to
Either Connectivity msg
does not work well with theAuthentication
layer? - The composition using
components
(ADR7: https://hydra.family/head-protocol/adr/7) is quite complicated and only allows for a all-or-nothing interface out of a component without much support for optional parts. - In particular, an
Etcd
component that deliversEither Connectivity msg
asinbound
messages cannot be composed easily with theAuthenticate
component that verifes signatures of incoming messages (it would need to understand that this is anEither
and only do it forRight msg
). - Instead, I explore expanding
NetworkCallback
to not onlydeliver
, but also provide anonConnectivity
callback. - After designing a more composable
onConnectivity
handling, I wondered how theEtcd
component would be determinig connectivity. - The
etcdctl
command line tool offers amember list
which returns a list of members if on a majority cluster, e.g.
{"header":{"cluster<sub>id</sub>":8903038213291328342,"member<sub>id</sub>":1564273230663938083,"raft<sub>term</sub>":2},"members":\[{"ID":1564273230663938083,"name":"127.0.0.1:5001","peerURLs":\["<http://127.0.0.1:5001>"\],"clientURLs":\["<http://127.0.0.1:2379>"\]},{"ID":3728543818779710175,"name":"127.0.0.1:5002","peerURLs":\["<http://127.0.0.1:5002>"\],"clientURLs":\["<http://127.0.0.1:2380>"\]}\]}
- But when invoked on a minority cluster it returns
{"level":"warn","ts":"2025-02-17T22:49:48.211708+0100","logger":"etcd-client","caller":"[email protected]/retry<sub>interceptor</sub>.<go:63>","msg":"retrying
of unary invoker
failed","target":"etcd-endpoints://0xc000026000/127.0.0.1:2379","attempt":0,"error":"rpc
error: code = DeadlineExceeded desc = context deadline exceeded"}
Error: context deadline exceeded
- When it cannot connect to an
etcd
instance it returns
{"level":"warn","ts":"2025-02-17T22:49:32.583103+0100","logger":"etcd-client","caller":"[email protected]/retry<sub>interceptor</sub>.<go:63>","msg":"retrying
of unary invoker
failed","target":"etcd-endpoints://0xc0004b81e0/127.0.0.1:2379","attempt":0,"error":"rpc
error: code = DeadlineExceeded desc = latest balancer error: last
connection error: connection error: desc = \\transport: Error while
dialing: dial tcp 127.0.0.1:2379: connect: connection refused\\"}
Error: context deadline exceeded
- When implementing
pollMembers
, suddenly thewaitMessages
was not blocked anymore? - While a litte crude, polling
member list
works nicely to get a full list of members (if we are connected to the majority cluster). - All this will change when we switch to a proper
grpc
client anyways
-
Current problem we want to solve is instead of passing a conduit to
mkProjection
function and running it inside we would like to stream data to all of the projections we have. -
Seems like this is easier said than done since we also rely on a projection result which is a
Projection
handle that is used to update theTVar
inside. -
I thought it might be a good idea to alter
mkProjection
and make it run inConduitT
so it can receive events and propagate them further and then, in the end return theProjection
handle. -
I made changes to the
mkProjection
that compile
mkProjection ::
- (MonadSTM m, MonadUnliftIO m) =>
+ MonadSTM m =>
model ->
-- | Projection function
(model -> event -> model) ->
- ConduitT () event (ResourceT m) () ->
- m (Projection (STM m) event model)
-mkProjection startingModel project eventSource = do
- tv <- newTVarIO startingModel
- runConduitRes $
- eventSource .| mapM_C (lift . atomically . update tv)
- pure
+ ConduitT event (Projection (STM m) event model) m ()
+mkProjection startingModel project = do
+ tv <- lift $ newTVarIO startingModel
+ meventSource <- await
+ _ <- case meventSource of
+ Nothing -> pure ()
+ Just eventSource ->
+ void $ yield eventSource .| mapM_C (atomically . update tv)
+ yield $
Projection
{ getLatest = readTVar tv
, update = update tv
but the main issue is that I can't get the results of all projections we need in the end that easy.
-- does not compile
headStatusP <- runConduitRes $ yield outputsC .| mkProjection Idle projectHeadStatus
- We need to be able to process streamed data from disk and also output like 5 of these projections that do different things.
- I discovered
sequenceConduits
which allows collection of the conduit result values. - Idea was to collect all projections which have the capability of receiving events as the conduit input.
[headStatusP] <- runConduit $ sequenceConduits [mkProjection Idle projectHeadStatus] >> sinkList
- Oh, just realized
sequenceConduits
need to have exactly the same type so my plan just failed
I think I need to revisit our approach and start from scratch.
-
So what we want to do is to reduce the memory footprint in hydra-node as the final outcome
-
There are couple of ADRs related to persisting stream of events and having different sinks that can read from the streams
-
Our API needs to become one of these event sinks
-
The first step is to prevent history output by default as history can grow pretty large and it is all kept in memory
-
We need to remove ServerOutput type and map all missing fields to StateChange type since that is what we will use to persist the changes to disk
-
I understand that we will keep existing projections but they will work on the StateChange type and each change will be forwarded to any existing sinks as the state changes over time
-
We already have
PersistenceIncremental
type that appends to disk, can we use similar handle? Most probably yes - but we need to pick the most performant function to write/read to/from disk. -
Seems like we currently use
eventPairFromPersistenceIncremental
to setup event stream/sink. What we do is load all events from disk. We also have a TVar holding the event id. Ideally what we would like is to output every new event in our api server. I should take a look at our projections to see how we output individual messages. -
Ok, yeah, projections are displaying the last message but looking at this code I am realizing how complex everything is. We should strive for simplicity here.
-
Another thought - would it help us to use Servant at least to separate the routing and handlers? I think it could help but otoh Servant can get crazy complex really fast.
-
So after looking at the relevant code and the issue https://github.com/cardano-scaling/hydra/issues/1618 I believe the most complex thing would be this
Websocket needs to emit this information on new state changes.
but even this is not hard I believe since we have control of what we need to do when setting up event source/sink pair.
- Streaming events using
conduit
makes us buy into theunliftio
andresourcet
environment. Does this go well with ourMonadThrow
et al classes? - When using conduits in
createHydraNode
, therunConduitRes
requires aMonadUnliftIO
context. We have aIOSim
usage of this though and its not clear if there can be aMonadUnliftIO (IOSim s)
instance even? - We have not only loading
[StateEvent]
fully into memory, but also[ServerOutput]
. - Made
mkProjection
to take a conduit, but then we are running it for each (3 times). Should do something withfuseBoth
or zip-like conduit combination.
- Started simplifying the
hydra-explorer
and wanted to get rid of allhydra-node
,hydra-tx
etc. dependencies because they include most of the cardano ecosystem. However, on the observer api we will need to refer to cardano-specifics likeUTxO
and some hydra entities likeParty
orHeadId
. So a dependency ontohydra-tx
is most likely needed. - Shouldn't these hydra specific types be in an actual
hydra-api
package? Thehydra-tx
or a futurehydra-client
could depend on that then. - When defining the observer API I was reaching for the
OnChainTx
data type as it has json instances and enumerates the things we need to observer. However, this would mean we need to depend onhydra-node
in thehydra-explorer
. - Could use the
HeadObservation
type, but that one is maybe a bit too low level and does not have JSON instances? -
OnChainTx
is really the level of detail we want (instantiated for cardano transactions, but not corrupted by cardano internal specifics) - Logging in the main entry point of
Hydra.Explorer
is depending onhydra-node
anyways. We could be exploring something different to get rid of this? Got https://hackage.haskell.org/package/Blammo recommended to me. - Got everything to compile (with a cut-off
hydra-chain-observer
). Now I want to have an end-to-end integration test forhydra-explorer
, that does not concern itself with individual observations, but rather that the (latest)hydra-chain-observer
can be used withhydra-explorer
. That, plus some (golden) testing agains theopenapi
schemas should be enough test coverage. - Modifying
hydra
andhydra-explorer
repositories to integration test new http-based reporting.- Doing so offline from a plane is a bit annoying as both
nix
orcabal
would be pulling dependencies from the internet. - Working around using an alias to the
cabal
built binary:
- Doing so offline from a plane is a bit annoying as both
alias hydra-chain-observer=../../hydra/dist-newstyle/build/x86_64-linux/ghc-9.6.6/hydra-chain-observer-0.19.0/x/hydra-chain-observer/build/hydra-chain-observer/hydra-chain-observer
-
cabal repl
is not picking up thealias
, maybe need to add it toPATH
? - Adding a
export PATH=<path to binary>:$PATH
to.envrc
is quite convenient - After connecting the two servers via a bounded queue, the test passes but sub-process are not gracefully stopped.
- I created a relevant issue to track this new feature request to enable stake certificates on L2 ledger.
- Didn't plan on working on this right away but wanted to explore a problem
with
PPViewHashesDontMatch
when trying to submit a new tx on L2. - This happens both when obtaining the protocol-parameters from the hydra-node or if I query them from cardano-node (the latter is expected to fail on L2 since we reduce the fees to zero)
- I added the line to print the protocol-parameters in our tx printer and it
seems like
changePParams
is not setting the protocol-parameters correctly for whatever reason:
changePParams :: PParams (ShelleyLedgerEra Era) -> TxBodyContent BuildTx -> TxBodyContent BuildTx
changePParams pparams tx =
tx{txProtocolParams = BuildTxWith $ Just $ LedgerProtocolParameters pparams}
- There is
setTxProtocolParams
I should probably use instead. - No luck, how come this didn't work? I don't see why setting the protocol-parameters like this doesn't work....
- I even compared the protocol-parameters loaded into the hydra-node and the ones I get back from hitting the hydra-node api and they are the same as expected
- Running out of ideas
- I want to know why I get mismatch between pparams on L2?
- It is because we start the hydra-node in a separate temp directory from the test driver so I got rid of the problem by querying hydra-node to obtain L2 protocol-parameters
- The weird issue I get is that the budget is overspent and it seems bumping
the
ExecutionUnits
doesn't help at all. - When pretty-printing the L2 tx I noticed that cpu and memory for cert redeemer are both zero so that must be the source of culprit
- Adding separately cert redeemer fixed the issue but I am now back to
PPViewHashesDontMatch
. - Not sure why this happens since I am doing a query to obtain hydra-node protocol parameters and using those to construct the transaction.
- Note that even if I don't change protocol-parameters the error is the same
- This whole chunk of work is to register a script address as a stake certificate and I still need to try to withdraw zero after this is working.
- One thing I wanted to do is to use the dummy script as the provided Data in the Cert Redeemers - is this even possible?
-
When trying to align
aiken
version in our repository with what is generated intoplutus.json
, I encountered errors inhydra-tx
tests even with the same aiken version as claimed. -
Error:
Expected the B constructor but got a different one
-
Seems to originate from
plutus-core
when it tries to run the builtinunBData
on data that is not a B (bytestring) -
The full error in
hydra-tx
tests actually includes what it tried tounBData
:Caused by: unBData (Constr 0 [ Constr 0 [ List [ Constr 0 [ Constr 0 [ B #7db6c8edf4227f62e1233880981eb1d4d89c14c3c92b63b2e130ede21c128c61 , I 21 ] , Constr 0 [ Constr 0 [ Constr 0 [ B #b0e9c25d9abdfc5867b9c0879b66aa60abbc7722ed56f833a3e2ad94 ] , Constr 1 [] ] , Map [(B #, Map [(B #, I 231)])] , Constr 0 [] , Constr 1 [] ] ] , Constr 0 ....
This looks a lot like a script context. Maybe something off with validator arguments? -
How can I inspect the uplc of an aiken script?
-
It must be the "compile-time" parameter of the initial script, which expects the commit script hash. If we use that unapplied on the transaction, the script context trips the validator code.
-
How was the
initialValidatorScript
used on master such that these tests / usages pass? -
Ahh .. someone applied the commit script parameter and stored the resulting script in the
plutus.json
! Most likely usingaiken blueprint apply -v initial
and then passing theaiken blueprint hash -v commit
into that. -
Realized that the
plutus.json
blueprint would have said that a script hasparameters
.