High priority TCP discovery messages

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

High priority TCP discovery messages

Denis Mekhanikov
A bit of background:
When TcpDiscoverySpi is used, TcpDiscoveryMetricsUpdateMessage is sent by a
coordinator once in metricsUpdateFrequency, which is 2 seconds by default.
It serves as a ping message, which ensures, that the ring is connected, and
all nodes are alive. These messages have a high priority, i.e., they are
put into the head of the RightMessageWorker's queue instead of its tail.

Now consider a situation, when a single link between two nodes in the ring
works slowly. It may receive and deliver all messages, but with a certain
delay. This situation is possible, when network is unstable or one of the
nodes experiences a lot of short GC pauses.
It leads to a growing message queue on the node, that stands before the
slow link. The worst part is that if high priority messages are generated
faster than they are processed, then other messages won't even get a chance
to be processed. Thus, no nodes will be kicked out of the cluster, but no
useful progress will happen. Partition map exchange may hang for this
reason, and the reason won't be obvious from the logs.

JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-10808
I made a draft of the fix for this problem:
https://github.com/apache/ignite/pull/5771
The PR also contains a test, that reproduces this situation.

The patch makes sure, that only a single TcpDiscoveryMetricsUpdateMessage
of one kind is stored in the queue, i.e., if a newer message comes to the
node, then the old one is discarded. It also checks, that regular messages
are also processed. If the last processed message had a high priority, then
new high-priority message won't be put to the head of the queue, but to the
tail instead.
The fix addresses the described problem, but increases the utilization of
the discovery threads a bit.

What do you think of this fix? Do you have better ideas, how to improve the
heartbeat mechanism to avoid such situations?

Denis
Reply | Threaded
Open this post in threaded view
|

Re: High priority TCP discovery messages

Anton Vinogradov-2
Denis,

Correct me in case I got it all wrong.
Since metrics are scheduled, a possible situation is to have more than 1
TcpDiscoveryMetricsUpdateMessage inside the queue due to slow ... some
reasons.

Proposed solution looks like a fix hides the real problem.

My proposal is
1) Write a warning message in case second metrics added to the queue.
BTW, looks like not only metrics messages should be checked.
Do we have another scheduled messages?

2) Start new metrics journey as scheduled, but only in case previous
journey finished.
And, again, write a warning message in case journey was not started.

On Thu, Jan 10, 2019 at 6:18 PM Denis Mekhanikov <[hidden email]>
wrote:

> A bit of background:
> When TcpDiscoverySpi is used, TcpDiscoveryMetricsUpdateMessage is sent by a
> coordinator once in metricsUpdateFrequency, which is 2 seconds by default.
> It serves as a ping message, which ensures, that the ring is connected, and
> all nodes are alive. These messages have a high priority, i.e., they are
> put into the head of the RightMessageWorker's queue instead of its tail.
>
> Now consider a situation, when a single link between two nodes in the ring
> works slowly. It may receive and deliver all messages, but with a certain
> delay. This situation is possible, when network is unstable or one of the
> nodes experiences a lot of short GC pauses.
> It leads to a growing message queue on the node, that stands before the
> slow link. The worst part is that if high priority messages are generated
> faster than they are processed, then other messages won't even get a chance
> to be processed. Thus, no nodes will be kicked out of the cluster, but no
> useful progress will happen. Partition map exchange may hang for this
> reason, and the reason won't be obvious from the logs.
>
> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-10808
> I made a draft of the fix for this problem:
> https://github.com/apache/ignite/pull/5771
> The PR also contains a test, that reproduces this situation.
>
> The patch makes sure, that only a single TcpDiscoveryMetricsUpdateMessage
> of one kind is stored in the queue, i.e., if a newer message comes to the
> node, then the old one is discarded. It also checks, that regular messages
> are also processed. If the last processed message had a high priority, then
> new high-priority message won't be put to the head of the queue, but to the
> tail instead.
> The fix addresses the described problem, but increases the utilization of
> the discovery threads a bit.
>
> What do you think of this fix? Do you have better ideas, how to improve the
> heartbeat mechanism to avoid such situations?
>
> Denis
>
Reply | Threaded
Open this post in threaded view
|

Re: High priority TCP discovery messages

yzhdanov
Denis, what if we remove priority difference for messages and always add
new to the end of the queue?

As far as traversing the queue - I don't like O(n) approaches =). So, with
adding all messages to the end of the queue (removing prio difference) I
would suggest that we save latest 1st lap message and latest 2nd lap
message and process metrics message in message worker thread in queue order
if they are latest and skip the otherwise.

Does this make sense?

--Yakov
Reply | Threaded
Open this post in threaded view
|

Re: High priority TCP discovery messages

Denis Mekhanikov
Yakov,

Sounds good. But there is a flaw in the procedure, that you described.
If we have a TcpDiscoveryMetricsUpdateMessage in a queue, and a newer one
arrives, then we will consider the existing one obsolete and won't process
it. The newest metrics update message will be moved to the queue's tail,
thus delaying the moment, when it will be processed.
So, we may never come to a point, when an actual
TcpDiscoveryMetricsUpdateMessage is processed.
But if we replace an existing message with a newer one, and save the old
position in a queue, then this approach will work. It will require a more
complex synchronization, though, so it will still lead to some overhead.

How big the message worker's queue may grow until it becomes a problem? If
it's 20 elements max, then linear time check is not that bad.
BTW, RingMessageWorker#addMessage method checks for duplicating messages in
the queue for some message types, including all custom discovery messages,
which is done in linear time.

Denis

пт, 11 янв. 2019 г. в 00:54, Yakov Zhdanov <[hidden email]>:

> Denis, what if we remove priority difference for messages and always add
> new to the end of the queue?
>
> As far as traversing the queue - I don't like O(n) approaches =). So, with
> adding all messages to the end of the queue (removing prio difference) I
> would suggest that we save latest 1st lap message and latest 2nd lap
> message and process metrics message in message worker thread in queue order
> if they are latest and skip the otherwise.
>
> Does this make sense?
>
> --Yakov
>
Reply | Threaded
Open this post in threaded view
|

Re: High priority TCP discovery messages

Denis Mekhanikov
I like the idea to make all messages be processed with equal priority.
It will make nodes with overgrown discovery message queues die more often,
though. But maybe, this is how it's supposed to work.

Denis

пт, 11 янв. 2019 г. в 16:26, Denis Mekhanikov <[hidden email]>:

> Yakov,
>
> Sounds good. But there is a flaw in the procedure, that you described.
> If we have a TcpDiscoveryMetricsUpdateMessage in a queue, and a newer one
> arrives, then we will consider the existing one obsolete and won't process
> it. The newest metrics update message will be moved to the queue's tail,
> thus delaying the moment, when it will be processed.
> So, we may never come to a point, when an actual
> TcpDiscoveryMetricsUpdateMessage is processed.
> But if we replace an existing message with a newer one, and save the old
> position in a queue, then this approach will work. It will require a more
> complex synchronization, though, so it will still lead to some overhead.
>
> How big the message worker's queue may grow until it becomes a problem? If
> it's 20 elements max, then linear time check is not that bad.
> BTW, RingMessageWorker#addMessage method checks for duplicating messages
> in the queue for some message types, including all custom discovery
> messages, which is done in linear time.
>
> Denis
>
> пт, 11 янв. 2019 г. в 00:54, Yakov Zhdanov <[hidden email]>:
>
>> Denis, what if we remove priority difference for messages and always add
>> new to the end of the queue?
>>
>> As far as traversing the queue - I don't like O(n) approaches =). So, with
>> adding all messages to the end of the queue (removing prio difference) I
>> would suggest that we save latest 1st lap message and latest 2nd lap
>> message and process metrics message in message worker thread in queue
>> order
>> if they are latest and skip the otherwise.
>>
>> Does this make sense?
>>
>> --Yakov
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: High priority TCP discovery messages

Anton Vinogradov-2
Denis,

+1 to the idea to get rid of priority.

Discovery should handle only messages used to update cluster state and
topology.
They are all small and have high priority.

There is no real reason to use discovery for metrics and similar "readonly"
features.
Maybe, better case it to have special "discovery like" channel (with ring
or analog) for metrics like messages which will not affect priority
features like PME?


Anyway, Why are fighting with duplicates inside the queue instead of
fighting with new message initial creation while previous not yet processed
on the cluster?

On Fri, Jan 11, 2019 at 4:30 PM Denis Mekhanikov <[hidden email]>
wrote:

> I like the idea to make all messages be processed with equal priority.
> It will make nodes with overgrown discovery message queues die more often,
> though. But maybe, this is how it's supposed to work.
>
> Denis
>
> пт, 11 янв. 2019 г. в 16:26, Denis Mekhanikov <[hidden email]>:
>
> > Yakov,
> >
> > Sounds good. But there is a flaw in the procedure, that you described.
> > If we have a TcpDiscoveryMetricsUpdateMessage in a queue, and a newer one
> > arrives, then we will consider the existing one obsolete and won't
> process
> > it. The newest metrics update message will be moved to the queue's tail,
> > thus delaying the moment, when it will be processed.
> > So, we may never come to a point, when an actual
> > TcpDiscoveryMetricsUpdateMessage is processed.
> > But if we replace an existing message with a newer one, and save the old
> > position in a queue, then this approach will work. It will require a more
> > complex synchronization, though, so it will still lead to some overhead.
> >
> > How big the message worker's queue may grow until it becomes a problem?
> If
> > it's 20 elements max, then linear time check is not that bad.
> > BTW, RingMessageWorker#addMessage method checks for duplicating messages
> > in the queue for some message types, including all custom discovery
> > messages, which is done in linear time.
> >
> > Denis
> >
> > пт, 11 янв. 2019 г. в 00:54, Yakov Zhdanov <[hidden email]>:
> >
> >> Denis, what if we remove priority difference for messages and always add
> >> new to the end of the queue?
> >>
> >> As far as traversing the queue - I don't like O(n) approaches =). So,
> with
> >> adding all messages to the end of the queue (removing prio difference) I
> >> would suggest that we save latest 1st lap message and latest 2nd lap
> >> message and process metrics message in message worker thread in queue
> >> order
> >> if they are latest and skip the otherwise.
> >>
> >> Does this make sense?
> >>
> >> --Yakov
> >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: High priority TCP discovery messages

yzhdanov
In reply to this post by Denis Mekhanikov
> How big the message worker's queue may grow until it becomes a problem?

Denis, you never know. Imagine node may be flooded with messages because of
the increased timeouts and network problems. I remember some cases with
hundreds of messages in queue on large topologies. Please, no O(n)
approaches =)

> So, we may never come to a point, when an actual
TcpDiscoveryMetricsUpdateMessage is processed.

Good catch! You can put hard limit and process enqued MetricsUpdate message
if last one of the kind was processed more than metricsUpdFreq millisecs
ago.

Denis, also note - initial problem is message queue growth. When we choose
to skip messages it means that node cannot process certain messages and
most probably experiencing problems. We need to think of killing such
nodes. I would suggest we allow queue overflow for 1 min, but if situation
does not go to normal then node should fire a special event and then kill
itself. Thoughts?

--Yakov