Conversation
Notices
-
I'm seeing those "bad SHA-1 HMAC" errors at around 1100-1700 notices in each of my !gnusocial rotated logs (default logrotate settings). A bit disconcerting.
-
@takemangoakenji Bad hashes for the hubsub/feedsub secrets, maybe?
Don't understand the need for secrets for a subscription personally.-
@maiyannah Something like 6600 bad HMAC errors over the past two months.
-
@takepapayaakenji Well I just mean to say, that would be my first inclination as to why those errors are occuring.
-
@maiyannah @mmn @moonman If you're curious about the worst offenders, I put it here: http://u.daggsy.com/2V
There aren't any secrets of HMACs in there; just the users, some event IDs, and partial notice content. Yes, it's DropBox, but 1.2 MB uncompressed seemed a bit much for a Pastebin analog.-
@takegrapeakenji @moonman @mmn I probably wouldn't mind dropbox as much if they didn't keep nagging me to get an account.
-
@mmn @moonman @maiyannah Some instances do odd things with the <id> field of the AtomPub, but I didn't want to spend too much time figuring out the perfect way of getting a useful URL.
-
@takepapayaakenji @moonman @mmn I notice it seems like a few problem instances, so I still feel this is communications related. Have you had any further problems related to SPC since your fix last night?
-
@maiyannah @mmn @moonman freezepeach.xyz seemed to be the worst as far as that subscription issue, but I'm definitely seeing stuff from users I hadn't seen before.
RDN seems to be the the most obviously bad one with respect to the bad HMAC errors. It's not alone, though.-
@takecherryakenji @moonman @mmn I will investigate this further when I manage to drag my ass out of bed and eat, been a real bad day for arthritis. Keep me in the loop though so I know if you fix it in the meantime, thanks :)
-
-
@takegrapeakenji How many of those messages are ones you actually missed? Pretty sure you replied to at least a few of the ones from my instance.
-
@maiyannah I know I've been missing stuff from RDN, and some from Quitter\.se. I don't see any of the bad HMAC errors from your instance in my logs since I started my own instance.
-
@takeFrankerZakenji There's a few in that zipped log, but I know at least some of those messages you RTd/favd/replied to
-
@maiyannah Are they notices that are replies to me?
-
@takeFrankerZakenji As far as I can see HMAC is used in the API for the secrets exchanged between servers so it appears from a cursory glance what is occuring in this case is the handshake between servers is failing.
In /plugins/ostatus/classes/hubsub.php:
if ($this->secret) {
$hmac = hash_hmac('sha1', $atom, $this->secret);
$headers[] = "X-Hub-Signature: sha1=$hmac";
} else {
$hmac = '(none)';
}
similar in feedsub
So what is probably happening is the HTTP header X-Hub-Signature is not what the receiving instance (yours) expected it to be.-
@takepapayaakenji By the way in looking at this I see in feedsub where it is setting a failed subscription to inactive so I made note of that for if I can try to add a retry mechanism.
-
@takePotato Knishesakenji The code that actually generates the error you are seeing is in /plugins/ostatus/classes/feedsub.php around 489
protected function validatePushSig($post, $hmac)
It is comparing their given 'secret' hash they sent in that HTTP header request, to what is stored locally in the database. That exception is thrown if they do not match.
As to why they don't match, it's likely because the subscription did not complete successful. I have a theory this field is blank, maybe check RDN subscriptions for an affected RDN user and see if it has a 'secret' in the table? I'd give you a query to try but I'm not at my computer :/ But I suspect that since the subscription didn't complete, the subscribing instance didn't get the secret hash, and as a result, they don't have it to present at negotiation time for the subscription pushes.-
@takebananaakenji As far as I can see, there is no check to ensure it's not blank or NULL so its very possible this is the problem.
-
-
-
-
-
-
-
-
-
-
-
-
-
@takeFrankerZakenji It occurred to me: the subscriptions to feed are refreshed on a certain interval, to regenerate the secret, so what could be happening too is the secrets are getting out of sync as a result.
-
@maiyannah Speaking of refreshing, it looks like active subscriptions go inactive when the relevant instance is unavailable. This time, it's quitter\.no.
-
@takekiwiakenji Which then rejects pushes when they happen because it's inactive...
I think the nature of our problem is starting to become clear.-
@maiyannah So we have bad secrets, empty secrets, and inactive subscriptions. Any other things I should scan for in feedsubs?
-
@takegrapeakenji What I would pay attention to is a feedsub for an instance you've had issues with, which is about to expire at some time you can watch the regeneration. It would confirm/deny that bit.
It basically looks like we have to communication errors that cause the same problem:
1] Not properly communicating at time of subscription, and
2] Not properly communicating when the subscription comes up for renewal,
both of these are fault-intolerant and have no retry mechanism, and when they fail, result in you not having the hash to decrypt the encoded atom feed that gets sent.
The inactive feed status is symptomatic more than cause, from what I can see, but nonetheless related.-
@maiyannah Some sort of retry and verification mechanism is needed, eh? Wouldn't that need a fix in whatever 'open' protocol is used for that? Diaspora and so on need the fix, too.
-
@takePotato Knishesakenji ostatus has a retry mechanism for these things, it's just disabled by default.
But this is gnuSocial (and other program's) implementation.
Rather than handling the exception by saying "maybe we should try to resolve this issue" it just discards the communication. It handles errors by logging them and then doing precisely nothing, which makes it fault intolerant. TCP is inherently prone to losing packets. Any framework or standard that uses it, needs to take this into account or it is flawed by design. In this case, I would argue that gnuSocial not attempting to mitigate packet loss or other miscommunication layers is a design flaw.
Moreover, it doesn't tell you it's doing this really, as evidenced by the fact that we had to go digging to figure this out.-
@takekiwiakenji At the very least I would propose having a seperate 'feedsub' status than inactive for a subscription that failed because of an error. gnuSocial is putting a feed in a non-error state when an error occurs.
-
@takebatcaveakenji In fact, thinking on it for a moment now, I think this would be the ideal solution. If you put it into an error state, you can then have the retry mechanism put in if the sub is in an error state (default it to 0 retries like normal ostatus stuff if it bothers you that much) and you could just toss an event hook in there for people who want to handle the error state a different way. This makes it clear the stream had an error, adds optional fault tolerance, and makes it so you can have extended reactive code if you desire, with what I think would be relatively minimal changes to the underlying software.
-
@maiyannah My only concern is about how slow upstream is when accepting changes.
-
@takecherryakenji Well, the wonders of open source free software is that I could just put the modified files up somewheres or email it to people and they can apply it themselves.
And if it ever gets terribad you can always fork it.
-
-
@takePotato Knishesakenji I will add this to my monumental gnusocial todo list and see if I cant do it myself next time I'm kind of on the groove/mindset to work on code.
-
-
-
-
-
-
-
-
-
-
I'm seeing those "bad SHA-1 HMAC" errors at around 1100-1700 notices in each of my !gnusocial rotated logs (default logrotate settings). A bit disconcerting.
-
@takeFrankerZakenji Before I upgraded from #StatusNet to #GNUsocial, I used to use the "two-step" http://url.federati.net/n2GDo ... but it is untested with GS. May or may not work. May or may not damage your instance.
-