Service grid redesign

classic Classic list List threaded Threaded
100 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

Service grid redesign

Denis Mekhanikov
Igniters,

I'd like to start a discussion on Ignite service grid redesign.
We have a number of problems in our current architecture, that have to be
addressed.

Here are the most severe ones:

One of them is lack of guarantee, that service is successfully deployed and
ready for work by the time, when *IgniteService.deploy*()* methods return.
Furthermore, if an exception is thrown from *Service.init() *method, then
the deploying side is not able to receive it, or even understand, that
service is in unusable state.
So, you may end up in such situation, when you deployed a service without
receiving any errors, then called a service's method, and hung indefinitely
on this invocation.
JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-3392

Another problem is locking during service deployment on unstable topology.
This issue is caused by missing updates in continuous query listeners on
the internal cache.
It is hard to reproduce, but it happens sometimes. We shouldn't allow such
possibility, that deployment methods hang without saying anything.
JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-6259

I think, we should change the deployment procedure to make it more reliable.
Moving from operating over internal replicated service cache to sending
custom discovery events seems to be a good idea.
Service deployment may trigger a discovery event, that will make chosen
nodes deploy the service, and the same event will notify other nodes about
the deployed service instances.
It will eliminate the need for distributed transactions on the internal
replicated system cache, and make the service deployment protocol more
transparent.

There are a few points, that should be taken into account though.

First of all, we can't wait for services to be deployed and initialised in
the discovery thread.
So, we need to make notification about service deployment result
asynchronous, presumably over communication protocol.
I can think of a procedure similar to the current exchange protocol, when
service deployment is initialised with an initial discovery message,
followed by asynchronous notifications from the hosting servers over
communication. And finally, one more discovery message will notify all
nodes about the service deployment result and location of the deployed
service instances. Coordinator will be responsible for collecting of the
deployment results in this scheme.

Another problem is failover in case, when some nodes fail during deployment
or further work.
The following cases should be handled:

   1. coordinator failure during deployment;
   2. failure of nodes, that were chosen to host the service, during
   deployment;
   3. failure of nodes, that contain deployed services, after the
   deployment.

The first case may be resolved by either continuation of deployment with a
new coordinator, or by cancelling it.
The second case will require another node to be chosen and notified. Maybe
another discovery message will be needed.
The third case will require redeployment, so coordinator should track
topology changes and redeploy failed services.

Another good improvement would be service versioning. This matter was
already discussed in another thread:
http://apache-ignite-developers.2346864.n4.nabble.com/Service-versioning-td20858.html
Let's resume this discussion and state the final decision here.
This feature is closely connected to peer class loading, which is not
working for services currently.
So, service versioning should be implemented along with peer class loading.
JIRA ticket for versioning:
https://issues.apache.org/jira/browse/IGNITE-6069
Peer class loading: https://issues.apache.org/jira/browse/IGNITE-975

Please share your thoughts. Constructive criticism is highly appreciated.

Denis
Reply | Threaded
Open this post in threaded view
|

Re: Service grid redesign

dsetrakyan
I think it is about time we take another look at our service functionality.
All the points you have raised sound reasonable to me.

On Fri, Mar 23, 2018 at 6:01 PM, Denis Mekhanikov <[hidden email]>
wrote:

> Igniters,
>
> I'd like to start a discussion on Ignite service grid redesign.
> We have a number of problems in our current architecture, that have to be
> addressed.
>
> Here are the most severe ones:
>
> One of them is lack of guarantee, that service is successfully deployed and
> ready for work by the time, when *IgniteService.deploy*()* methods return.
> Furthermore, if an exception is thrown from *Service.init() *method, then
> the deploying side is not able to receive it, or even understand, that
> service is in unusable state.
> So, you may end up in such situation, when you deployed a service without
> receiving any errors, then called a service's method, and hung indefinitely
> on this invocation.
> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-3392
>
> Another problem is locking during service deployment on unstable topology.
> This issue is caused by missing updates in continuous query listeners on
> the internal cache.
> It is hard to reproduce, but it happens sometimes. We shouldn't allow such
> possibility, that deployment methods hang without saying anything.
> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-6259
>
> I think, we should change the deployment procedure to make it more
> reliable.
> Moving from operating over internal replicated service cache to sending
> custom discovery events seems to be a good idea.
> Service deployment may trigger a discovery event, that will make chosen
> nodes deploy the service, and the same event will notify other nodes about
> the deployed service instances.
> It will eliminate the need for distributed transactions on the internal
> replicated system cache, and make the service deployment protocol more
> transparent.
>
> There are a few points, that should be taken into account though.
>
> First of all, we can't wait for services to be deployed and initialised in
> the discovery thread.
> So, we need to make notification about service deployment result
> asynchronous, presumably over communication protocol.
> I can think of a procedure similar to the current exchange protocol, when
> service deployment is initialised with an initial discovery message,
> followed by asynchronous notifications from the hosting servers over
> communication. And finally, one more discovery message will notify all
> nodes about the service deployment result and location of the deployed
> service instances. Coordinator will be responsible for collecting of the
> deployment results in this scheme.
>
> Another problem is failover in case, when some nodes fail during deployment
> or further work.
> The following cases should be handled:
>
>    1. coordinator failure during deployment;
>    2. failure of nodes, that were chosen to host the service, during
>    deployment;
>    3. failure of nodes, that contain deployed services, after the
>    deployment.
>
> The first case may be resolved by either continuation of deployment with a
> new coordinator, or by cancelling it.
> The second case will require another node to be chosen and notified. Maybe
> another discovery message will be needed.
> The third case will require redeployment, so coordinator should track
> topology changes and redeploy failed services.
>
> Another good improvement would be service versioning. This matter was
> already discussed in another thread:
> http://apache-ignite-developers.2346864.n4.nabble.com/Service-versioning-
> td20858.html
> Let's resume this discussion and state the final decision here.
> This feature is closely connected to peer class loading, which is not
> working for services currently.
> So, service versioning should be implemented along with peer class loading.
> JIRA ticket for versioning:
> https://issues.apache.org/jira/browse/IGNITE-6069
> Peer class loading: https://issues.apache.org/jira/browse/IGNITE-975
>
> Please share your thoughts. Constructive criticism is highly appreciated.
>
> Denis
>
Reply | Threaded
Open this post in threaded view
|

Re: Service grid redesign

Denis Magda-2
In reply to this post by Denis Mekhanikov
Denis,

Thanks for the extensive analysis. There is a vast room for optimizations
on the service grid side.

Yakov, Sam, Alex G.,

How do you like the idea of the usage of discovery protocol for the service
grid system messages exchange? Any pitfalls?


--
Denis


On Fri, Mar 23, 2018 at 8:01 AM, Denis Mekhanikov <[hidden email]>
wrote:

> Igniters,
>
> I'd like to start a discussion on Ignite service grid redesign.
> We have a number of problems in our current architecture, that have to be
> addressed.
>
> Here are the most severe ones:
>
> One of them is lack of guarantee, that service is successfully deployed and
> ready for work by the time, when *IgniteService.deploy*()* methods return.
> Furthermore, if an exception is thrown from *Service.init() *method, then
> the deploying side is not able to receive it, or even understand, that
> service is in unusable state.
> So, you may end up in such situation, when you deployed a service without
> receiving any errors, then called a service's method, and hung indefinitely
> on this invocation.
> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-3392
>
> Another problem is locking during service deployment on unstable topology.
> This issue is caused by missing updates in continuous query listeners on
> the internal cache.
> It is hard to reproduce, but it happens sometimes. We shouldn't allow such
> possibility, that deployment methods hang without saying anything.
> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-6259
>
> I think, we should change the deployment procedure to make it more
> reliable.
> Moving from operating over internal replicated service cache to sending
> custom discovery events seems to be a good idea.
> Service deployment may trigger a discovery event, that will make chosen
> nodes deploy the service, and the same event will notify other nodes about
> the deployed service instances.
> It will eliminate the need for distributed transactions on the internal
> replicated system cache, and make the service deployment protocol more
> transparent.
>
> There are a few points, that should be taken into account though.
>
> First of all, we can't wait for services to be deployed and initialised in
> the discovery thread.
> So, we need to make notification about service deployment result
> asynchronous, presumably over communication protocol.
> I can think of a procedure similar to the current exchange protocol, when
> service deployment is initialised with an initial discovery message,
> followed by asynchronous notifications from the hosting servers over
> communication. And finally, one more discovery message will notify all
> nodes about the service deployment result and location of the deployed
> service instances. Coordinator will be responsible for collecting of the
> deployment results in this scheme.
>
> Another problem is failover in case, when some nodes fail during deployment
> or further work.
> The following cases should be handled:
>
>    1. coordinator failure during deployment;
>    2. failure of nodes, that were chosen to host the service, during
>    deployment;
>    3. failure of nodes, that contain deployed services, after the
>    deployment.
>
> The first case may be resolved by either continuation of deployment with a
> new coordinator, or by cancelling it.
> The second case will require another node to be chosen and notified. Maybe
> another discovery message will be needed.
> The third case will require redeployment, so coordinator should track
> topology changes and redeploy failed services.
>
> Another good improvement would be service versioning. This matter was
> already discussed in another thread:
> http://apache-ignite-developers.2346864.n4.nabble.com/Service-versioning-
> td20858.html
> Let's resume this discussion and state the final decision here.
> This feature is closely connected to peer class loading, which is not
> working for services currently.
> So, service versioning should be implemented along with peer class loading.
> JIRA ticket for versioning:
> https://issues.apache.org/jira/browse/IGNITE-6069
> Peer class loading: https://issues.apache.org/jira/browse/IGNITE-975
>
> Please share your thoughts. Constructive criticism is highly appreciated.
>
> Denis
>
Reply | Threaded
Open this post in threaded view
|

Re: Service grid redesign

daradurvs
Hi, Denis Mekhanikov!

As far as I know, Ignite services are based on IgniteCache and we have
all its features. We can use listeners or continuous queries for
deployment synchronizations.

Why do you want using the discovery layer for that?

One more thing: we can use baseline approach for services, that means
*IgniteService.deploy()* returns ready to work service after
deployment on baseline nodes and deploy to other nodes on demand, for
example when deployed service's loading will be hight.

About versioning, maybe there is sense to extend public API:
IgniteServices.service(name, *version*)?

At first deployment, we can compute service's hashcode (just for an
example) and store it, after new deployment request for services with
an existing name we will compute new service's hashcode and compare
them if they have different hashcodes that we will deploy new service
as service with a different version.


On Fri, Mar 23, 2018 at 10:03 PM, Denis Magda <[hidden email]> wrote:

> Denis,
>
> Thanks for the extensive analysis. There is a vast room for optimizations
> on the service grid side.
>
> Yakov, Sam, Alex G.,
>
> How do you like the idea of the usage of discovery protocol for the service
> grid system messages exchange? Any pitfalls?
>
>
> --
> Denis
>
>
> On Fri, Mar 23, 2018 at 8:01 AM, Denis Mekhanikov <[hidden email]>
> wrote:
>
>> Igniters,
>>
>> I'd like to start a discussion on Ignite service grid redesign.
>> We have a number of problems in our current architecture, that have to be
>> addressed.
>>
>> Here are the most severe ones:
>>
>> One of them is lack of guarantee, that service is successfully deployed and
>> ready for work by the time, when *IgniteService.deploy*()* methods return.
>> Furthermore, if an exception is thrown from *Service.init() *method, then
>> the deploying side is not able to receive it, or even understand, that
>> service is in unusable state.
>> So, you may end up in such situation, when you deployed a service without
>> receiving any errors, then called a service's method, and hung indefinitely
>> on this invocation.
>> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-3392
>>
>> Another problem is locking during service deployment on unstable topology.
>> This issue is caused by missing updates in continuous query listeners on
>> the internal cache.
>> It is hard to reproduce, but it happens sometimes. We shouldn't allow such
>> possibility, that deployment methods hang without saying anything.
>> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-6259
>>
>> I think, we should change the deployment procedure to make it more
>> reliable.
>> Moving from operating over internal replicated service cache to sending
>> custom discovery events seems to be a good idea.
>> Service deployment may trigger a discovery event, that will make chosen
>> nodes deploy the service, and the same event will notify other nodes about
>> the deployed service instances.
>> It will eliminate the need for distributed transactions on the internal
>> replicated system cache, and make the service deployment protocol more
>> transparent.
>>
>> There are a few points, that should be taken into account though.
>>
>> First of all, we can't wait for services to be deployed and initialised in
>> the discovery thread.
>> So, we need to make notification about service deployment result
>> asynchronous, presumably over communication protocol.
>> I can think of a procedure similar to the current exchange protocol, when
>> service deployment is initialised with an initial discovery message,
>> followed by asynchronous notifications from the hosting servers over
>> communication. And finally, one more discovery message will notify all
>> nodes about the service deployment result and location of the deployed
>> service instances. Coordinator will be responsible for collecting of the
>> deployment results in this scheme.
>>
>> Another problem is failover in case, when some nodes fail during deployment
>> or further work.
>> The following cases should be handled:
>>
>>    1. coordinator failure during deployment;
>>    2. failure of nodes, that were chosen to host the service, during
>>    deployment;
>>    3. failure of nodes, that contain deployed services, after the
>>    deployment.
>>
>> The first case may be resolved by either continuation of deployment with a
>> new coordinator, or by cancelling it.
>> The second case will require another node to be chosen and notified. Maybe
>> another discovery message will be needed.
>> The third case will require redeployment, so coordinator should track
>> topology changes and redeploy failed services.
>>
>> Another good improvement would be service versioning. This matter was
>> already discussed in another thread:
>> http://apache-ignite-developers.2346864.n4.nabble.com/Service-versioning-
>> td20858.html
>> Let's resume this discussion and state the final decision here.
>> This feature is closely connected to peer class loading, which is not
>> working for services currently.
>> So, service versioning should be implemented along with peer class loading.
>> JIRA ticket for versioning:
>> https://issues.apache.org/jira/browse/IGNITE-6069
>> Peer class loading: https://issues.apache.org/jira/browse/IGNITE-975
>>
>> Please share your thoughts. Constructive criticism is highly appreciated.
>>
>> Denis
>>



--
Best Regards, Vyacheslav D.
Reply | Threaded
Open this post in threaded view
|

Re: Service grid redesign

Denis Mekhanikov
Vyacheslav,

Service deployment design, based on replicated utility cache has proven to
be unstable and deadlock-prone.
You can find a list of JIRA issues, connected to it, in my previous letter.

The intention behind it is similar to the binary metadata redesign, that
happened in the following ticket: IGNITE-4157
<https://issues.apache.org/jira/browse/IGNITE-4157>
This change in service deployment procedure will eliminate need for another
internal replicated cache
and make service deployment more reliable on unstable topology.

Denis

вт, 27 мар. 2018 г. в 23:21, Vyacheslav Daradur <[hidden email]>:

> Hi, Denis Mekhanikov!
>
> As far as I know, Ignite services are based on IgniteCache and we have
> all its features. We can use listeners or continuous queries for
> deployment synchronizations.
>
> Why do you want using the discovery layer for that?
>
> One more thing: we can use baseline approach for services, that means
> *IgniteService.deploy()* returns ready to work service after
> deployment on baseline nodes and deploy to other nodes on demand, for
> example when deployed service's loading will be hight.
>
> About versioning, maybe there is sense to extend public API:
> IgniteServices.service(name, *version*)?
>
> At first deployment, we can compute service's hashcode (just for an
> example) and store it, after new deployment request for services with
> an existing name we will compute new service's hashcode and compare
> them if they have different hashcodes that we will deploy new service
> as service with a different version.
>
>
> On Fri, Mar 23, 2018 at 10:03 PM, Denis Magda <[hidden email]> wrote:
> > Denis,
> >
> > Thanks for the extensive analysis. There is a vast room for optimizations
> > on the service grid side.
> >
> > Yakov, Sam, Alex G.,
> >
> > How do you like the idea of the usage of discovery protocol for the
> service
> > grid system messages exchange? Any pitfalls?
> >
> >
> > --
> > Denis
> >
> >
> > On Fri, Mar 23, 2018 at 8:01 AM, Denis Mekhanikov <[hidden email]
> >
> > wrote:
> >
> >> Igniters,
> >>
> >> I'd like to start a discussion on Ignite service grid redesign.
> >> We have a number of problems in our current architecture, that have to
> be
> >> addressed.
> >>
> >> Here are the most severe ones:
> >>
> >> One of them is lack of guarantee, that service is successfully deployed
> and
> >> ready for work by the time, when *IgniteService.deploy*()* methods
> return.
> >> Furthermore, if an exception is thrown from *Service.init() *method,
> then
> >> the deploying side is not able to receive it, or even understand, that
> >> service is in unusable state.
> >> So, you may end up in such situation, when you deployed a service
> without
> >> receiving any errors, then called a service's method, and hung
> indefinitely
> >> on this invocation.
> >> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-3392
> >>
> >> Another problem is locking during service deployment on unstable
> topology.
> >> This issue is caused by missing updates in continuous query listeners on
> >> the internal cache.
> >> It is hard to reproduce, but it happens sometimes. We shouldn't allow
> such
> >> possibility, that deployment methods hang without saying anything.
> >> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-6259
> >>
> >> I think, we should change the deployment procedure to make it more
> >> reliable.
> >> Moving from operating over internal replicated service cache to sending
> >> custom discovery events seems to be a good idea.
> >> Service deployment may trigger a discovery event, that will make chosen
> >> nodes deploy the service, and the same event will notify other nodes
> about
> >> the deployed service instances.
> >> It will eliminate the need for distributed transactions on the internal
> >> replicated system cache, and make the service deployment protocol more
> >> transparent.
> >>
> >> There are a few points, that should be taken into account though.
> >>
> >> First of all, we can't wait for services to be deployed and initialised
> in
> >> the discovery thread.
> >> So, we need to make notification about service deployment result
> >> asynchronous, presumably over communication protocol.
> >> I can think of a procedure similar to the current exchange protocol,
> when
> >> service deployment is initialised with an initial discovery message,
> >> followed by asynchronous notifications from the hosting servers over
> >> communication. And finally, one more discovery message will notify all
> >> nodes about the service deployment result and location of the deployed
> >> service instances. Coordinator will be responsible for collecting of the
> >> deployment results in this scheme.
> >>
> >> Another problem is failover in case, when some nodes fail during
> deployment
> >> or further work.
> >> The following cases should be handled:
> >>
> >>    1. coordinator failure during deployment;
> >>    2. failure of nodes, that were chosen to host the service, during
> >>    deployment;
> >>    3. failure of nodes, that contain deployed services, after the
> >>    deployment.
> >>
> >> The first case may be resolved by either continuation of deployment
> with a
> >> new coordinator, or by cancelling it.
> >> The second case will require another node to be chosen and notified.
> Maybe
> >> another discovery message will be needed.
> >> The third case will require redeployment, so coordinator should track
> >> topology changes and redeploy failed services.
> >>
> >> Another good improvement would be service versioning. This matter was
> >> already discussed in another thread:
> >>
> http://apache-ignite-developers.2346864.n4.nabble.com/Service-versioning-
> >> td20858.html
> >> Let's resume this discussion and state the final decision here.
> >> This feature is closely connected to peer class loading, which is not
> >> working for services currently.
> >> So, service versioning should be implemented along with peer class
> loading.
> >> JIRA ticket for versioning:
> >> https://issues.apache.org/jira/browse/IGNITE-6069
> >> Peer class loading: https://issues.apache.org/jira/browse/IGNITE-975
> >>
> >> Please share your thoughts. Constructive criticism is highly
> appreciated.
> >>
> >> Denis
> >>
>
>
>
> --
> Best Regards, Vyacheslav D.
>
Reply | Threaded
Open this post in threaded view
|

Re: Service grid redesign

daradurvs
Denis, thanks for the link.

I looked through the task and I think that understand your redesign point now.

Do you have a clear plan or IEP for the whole redesign?

I'm interested in this component and I'd like to take part in the development.



On Mon, Apr 2, 2018 at 2:55 PM, Denis Mekhanikov <[hidden email]> wrote:

> Vyacheslav,
>
> Service deployment design, based on replicated utility cache has proven to
> be unstable and deadlock-prone.
> You can find a list of JIRA issues, connected to it, in my previous letter.
>
> The intention behind it is similar to the binary metadata redesign, that
> happened in the following ticket: IGNITE-4157
> <https://issues.apache.org/jira/browse/IGNITE-4157>
> This change in service deployment procedure will eliminate need for another
> internal replicated cache
> and make service deployment more reliable on unstable topology.
>
> Denis
>
> вт, 27 мар. 2018 г. в 23:21, Vyacheslav Daradur <[hidden email]>:
>
>> Hi, Denis Mekhanikov!
>>
>> As far as I know, Ignite services are based on IgniteCache and we have
>> all its features. We can use listeners or continuous queries for
>> deployment synchronizations.
>>
>> Why do you want using the discovery layer for that?
>>
>> One more thing: we can use baseline approach for services, that means
>> *IgniteService.deploy()* returns ready to work service after
>> deployment on baseline nodes and deploy to other nodes on demand, for
>> example when deployed service's loading will be hight.
>>
>> About versioning, maybe there is sense to extend public API:
>> IgniteServices.service(name, *version*)?
>>
>> At first deployment, we can compute service's hashcode (just for an
>> example) and store it, after new deployment request for services with
>> an existing name we will compute new service's hashcode and compare
>> them if they have different hashcodes that we will deploy new service
>> as service with a different version.
>>
>>
>> On Fri, Mar 23, 2018 at 10:03 PM, Denis Magda <[hidden email]> wrote:
>> > Denis,
>> >
>> > Thanks for the extensive analysis. There is a vast room for optimizations
>> > on the service grid side.
>> >
>> > Yakov, Sam, Alex G.,
>> >
>> > How do you like the idea of the usage of discovery protocol for the
>> service
>> > grid system messages exchange? Any pitfalls?
>> >
>> >
>> > --
>> > Denis
>> >
>> >
>> > On Fri, Mar 23, 2018 at 8:01 AM, Denis Mekhanikov <[hidden email]
>> >
>> > wrote:
>> >
>> >> Igniters,
>> >>
>> >> I'd like to start a discussion on Ignite service grid redesign.
>> >> We have a number of problems in our current architecture, that have to
>> be
>> >> addressed.
>> >>
>> >> Here are the most severe ones:
>> >>
>> >> One of them is lack of guarantee, that service is successfully deployed
>> and
>> >> ready for work by the time, when *IgniteService.deploy*()* methods
>> return.
>> >> Furthermore, if an exception is thrown from *Service.init() *method,
>> then
>> >> the deploying side is not able to receive it, or even understand, that
>> >> service is in unusable state.
>> >> So, you may end up in such situation, when you deployed a service
>> without
>> >> receiving any errors, then called a service's method, and hung
>> indefinitely
>> >> on this invocation.
>> >> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-3392
>> >>
>> >> Another problem is locking during service deployment on unstable
>> topology.
>> >> This issue is caused by missing updates in continuous query listeners on
>> >> the internal cache.
>> >> It is hard to reproduce, but it happens sometimes. We shouldn't allow
>> such
>> >> possibility, that deployment methods hang without saying anything.
>> >> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-6259
>> >>
>> >> I think, we should change the deployment procedure to make it more
>> >> reliable.
>> >> Moving from operating over internal replicated service cache to sending
>> >> custom discovery events seems to be a good idea.
>> >> Service deployment may trigger a discovery event, that will make chosen
>> >> nodes deploy the service, and the same event will notify other nodes
>> about
>> >> the deployed service instances.
>> >> It will eliminate the need for distributed transactions on the internal
>> >> replicated system cache, and make the service deployment protocol more
>> >> transparent.
>> >>
>> >> There are a few points, that should be taken into account though.
>> >>
>> >> First of all, we can't wait for services to be deployed and initialised
>> in
>> >> the discovery thread.
>> >> So, we need to make notification about service deployment result
>> >> asynchronous, presumably over communication protocol.
>> >> I can think of a procedure similar to the current exchange protocol,
>> when
>> >> service deployment is initialised with an initial discovery message,
>> >> followed by asynchronous notifications from the hosting servers over
>> >> communication. And finally, one more discovery message will notify all
>> >> nodes about the service deployment result and location of the deployed
>> >> service instances. Coordinator will be responsible for collecting of the
>> >> deployment results in this scheme.
>> >>
>> >> Another problem is failover in case, when some nodes fail during
>> deployment
>> >> or further work.
>> >> The following cases should be handled:
>> >>
>> >>    1. coordinator failure during deployment;
>> >>    2. failure of nodes, that were chosen to host the service, during
>> >>    deployment;
>> >>    3. failure of nodes, that contain deployed services, after the
>> >>    deployment.
>> >>
>> >> The first case may be resolved by either continuation of deployment
>> with a
>> >> new coordinator, or by cancelling it.
>> >> The second case will require another node to be chosen and notified.
>> Maybe
>> >> another discovery message will be needed.
>> >> The third case will require redeployment, so coordinator should track
>> >> topology changes and redeploy failed services.
>> >>
>> >> Another good improvement would be service versioning. This matter was
>> >> already discussed in another thread:
>> >>
>> http://apache-ignite-developers.2346864.n4.nabble.com/Service-versioning-
>> >> td20858.html
>> >> Let's resume this discussion and state the final decision here.
>> >> This feature is closely connected to peer class loading, which is not
>> >> working for services currently.
>> >> So, service versioning should be implemented along with peer class
>> loading.
>> >> JIRA ticket for versioning:
>> >> https://issues.apache.org/jira/browse/IGNITE-6069
>> >> Peer class loading: https://issues.apache.org/jira/browse/IGNITE-975
>> >>
>> >> Please share your thoughts. Constructive criticism is highly
>> appreciated.
>> >>
>> >> Denis
>> >>
>>
>>
>>
>> --
>> Best Regards, Vyacheslav D.
>>



--
Best Regards, Vyacheslav D.
Reply | Threaded
Open this post in threaded view
|

Re: Service grid redesign

Denis Mekhanikov
Vyacheslav,

I've just posted my first draft of the IEP:
https://cwiki.apache.org/confluence/display/IGNITE/IEP-17%3A+Service+grid+improvements
It's not finished yet, but you can get the idea from it.
If you have some thoughts on your mind, please let me know, I'll add them
to the IEP.

Denis

ср, 4 апр. 2018 г. в 13:09, Vyacheslav Daradur <[hidden email]>:

> Denis, thanks for the link.
>
> I looked through the task and I think that understand your redesign point
> now.
>
> Do you have a clear plan or IEP for the whole redesign?
>
> I'm interested in this component and I'd like to take part in the
> development.
>
>
>
> On Mon, Apr 2, 2018 at 2:55 PM, Denis Mekhanikov <[hidden email]>
> wrote:
> > Vyacheslav,
> >
> > Service deployment design, based on replicated utility cache has proven
> to
> > be unstable and deadlock-prone.
> > You can find a list of JIRA issues, connected to it, in my previous
> letter.
> >
> > The intention behind it is similar to the binary metadata redesign, that
> > happened in the following ticket: IGNITE-4157
> > <https://issues.apache.org/jira/browse/IGNITE-4157>
> > This change in service deployment procedure will eliminate need for
> another
> > internal replicated cache
> > and make service deployment more reliable on unstable topology.
> >
> > Denis
> >
> > вт, 27 мар. 2018 г. в 23:21, Vyacheslav Daradur <[hidden email]>:
> >
> >> Hi, Denis Mekhanikov!
> >>
> >> As far as I know, Ignite services are based on IgniteCache and we have
> >> all its features. We can use listeners or continuous queries for
> >> deployment synchronizations.
> >>
> >> Why do you want using the discovery layer for that?
> >>
> >> One more thing: we can use baseline approach for services, that means
> >> *IgniteService.deploy()* returns ready to work service after
> >> deployment on baseline nodes and deploy to other nodes on demand, for
> >> example when deployed service's loading will be hight.
> >>
> >> About versioning, maybe there is sense to extend public API:
> >> IgniteServices.service(name, *version*)?
> >>
> >> At first deployment, we can compute service's hashcode (just for an
> >> example) and store it, after new deployment request for services with
> >> an existing name we will compute new service's hashcode and compare
> >> them if they have different hashcodes that we will deploy new service
> >> as service with a different version.
> >>
> >>
> >> On Fri, Mar 23, 2018 at 10:03 PM, Denis Magda <[hidden email]>
> wrote:
> >> > Denis,
> >> >
> >> > Thanks for the extensive analysis. There is a vast room for
> optimizations
> >> > on the service grid side.
> >> >
> >> > Yakov, Sam, Alex G.,
> >> >
> >> > How do you like the idea of the usage of discovery protocol for the
> >> service
> >> > grid system messages exchange? Any pitfalls?
> >> >
> >> >
> >> > --
> >> > Denis
> >> >
> >> >
> >> > On Fri, Mar 23, 2018 at 8:01 AM, Denis Mekhanikov <
> [hidden email]
> >> >
> >> > wrote:
> >> >
> >> >> Igniters,
> >> >>
> >> >> I'd like to start a discussion on Ignite service grid redesign.
> >> >> We have a number of problems in our current architecture, that have
> to
> >> be
> >> >> addressed.
> >> >>
> >> >> Here are the most severe ones:
> >> >>
> >> >> One of them is lack of guarantee, that service is successfully
> deployed
> >> and
> >> >> ready for work by the time, when *IgniteService.deploy*()* methods
> >> return.
> >> >> Furthermore, if an exception is thrown from *Service.init() *method,
> >> then
> >> >> the deploying side is not able to receive it, or even understand,
> that
> >> >> service is in unusable state.
> >> >> So, you may end up in such situation, when you deployed a service
> >> without
> >> >> receiving any errors, then called a service's method, and hung
> >> indefinitely
> >> >> on this invocation.
> >> >> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-3392
> >> >>
> >> >> Another problem is locking during service deployment on unstable
> >> topology.
> >> >> This issue is caused by missing updates in continuous query
> listeners on
> >> >> the internal cache.
> >> >> It is hard to reproduce, but it happens sometimes. We shouldn't allow
> >> such
> >> >> possibility, that deployment methods hang without saying anything.
> >> >> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-6259
> >> >>
> >> >> I think, we should change the deployment procedure to make it more
> >> >> reliable.
> >> >> Moving from operating over internal replicated service cache to
> sending
> >> >> custom discovery events seems to be a good idea.
> >> >> Service deployment may trigger a discovery event, that will make
> chosen
> >> >> nodes deploy the service, and the same event will notify other nodes
> >> about
> >> >> the deployed service instances.
> >> >> It will eliminate the need for distributed transactions on the
> internal
> >> >> replicated system cache, and make the service deployment protocol
> more
> >> >> transparent.
> >> >>
> >> >> There are a few points, that should be taken into account though.
> >> >>
> >> >> First of all, we can't wait for services to be deployed and
> initialised
> >> in
> >> >> the discovery thread.
> >> >> So, we need to make notification about service deployment result
> >> >> asynchronous, presumably over communication protocol.
> >> >> I can think of a procedure similar to the current exchange protocol,
> >> when
> >> >> service deployment is initialised with an initial discovery message,
> >> >> followed by asynchronous notifications from the hosting servers over
> >> >> communication. And finally, one more discovery message will notify
> all
> >> >> nodes about the service deployment result and location of the
> deployed
> >> >> service instances. Coordinator will be responsible for collecting of
> the
> >> >> deployment results in this scheme.
> >> >>
> >> >> Another problem is failover in case, when some nodes fail during
> >> deployment
> >> >> or further work.
> >> >> The following cases should be handled:
> >> >>
> >> >>    1. coordinator failure during deployment;
> >> >>    2. failure of nodes, that were chosen to host the service, during
> >> >>    deployment;
> >> >>    3. failure of nodes, that contain deployed services, after the
> >> >>    deployment.
> >> >>
> >> >> The first case may be resolved by either continuation of deployment
> >> with a
> >> >> new coordinator, or by cancelling it.
> >> >> The second case will require another node to be chosen and notified.
> >> Maybe
> >> >> another discovery message will be needed.
> >> >> The third case will require redeployment, so coordinator should track
> >> >> topology changes and redeploy failed services.
> >> >>
> >> >> Another good improvement would be service versioning. This matter was
> >> >> already discussed in another thread:
> >> >>
> >>
> http://apache-ignite-developers.2346864.n4.nabble.com/Service-versioning-
> >> >> td20858.html
> >> >> Let's resume this discussion and state the final decision here.
> >> >> This feature is closely connected to peer class loading, which is not
> >> >> working for services currently.
> >> >> So, service versioning should be implemented along with peer class
> >> loading.
> >> >> JIRA ticket for versioning:
> >> >> https://issues.apache.org/jira/browse/IGNITE-6069
> >> >> Peer class loading: https://issues.apache.org/jira/browse/IGNITE-975
> >> >>
> >> >> Please share your thoughts. Constructive criticism is highly
> >> appreciated.
> >> >>
> >> >> Denis
> >> >>
> >>
> >>
> >>
> >> --
> >> Best Regards, Vyacheslav D.
> >>
>
>
>
> --
> Best Regards, Vyacheslav D.
>
Reply | Threaded
Open this post in threaded view
|

Re: Service grid redesign

dsetrakyan
Here is the correct link:
https://cwiki.apache.org/confluence/display/IGNITE/IEP-17%3A+Oil+Change+in+Service+Grid

I have looked at the tickets there, and I believe that we should not
support peer-deployment for services. It is very hard and I do not think we
should even try.

I am proposing closing this ticket as Won't Fix -
https://issues.apache.org/jira/browse/IGNITE-975

D.

On Wed, Apr 4, 2018 at 5:39 AM, Denis Mekhanikov <[hidden email]>
wrote:

> Vyacheslav,
>
> I've just posted my first draft of the IEP:
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-17%3A+Service+grid+
> improvements
> It's not finished yet, but you can get the idea from it.
> If you have some thoughts on your mind, please let me know, I'll add them
> to the IEP.
>
> Denis
>
> ср, 4 апр. 2018 г. в 13:09, Vyacheslav Daradur <[hidden email]>:
>
> > Denis, thanks for the link.
> >
> > I looked through the task and I think that understand your redesign point
> > now.
> >
> > Do you have a clear plan or IEP for the whole redesign?
> >
> > I'm interested in this component and I'd like to take part in the
> > development.
> >
> >
> >
> > On Mon, Apr 2, 2018 at 2:55 PM, Denis Mekhanikov <[hidden email]>
> > wrote:
> > > Vyacheslav,
> > >
> > > Service deployment design, based on replicated utility cache has proven
> > to
> > > be unstable and deadlock-prone.
> > > You can find a list of JIRA issues, connected to it, in my previous
> > letter.
> > >
> > > The intention behind it is similar to the binary metadata redesign,
> that
> > > happened in the following ticket: IGNITE-4157
> > > <https://issues.apache.org/jira/browse/IGNITE-4157>
> > > This change in service deployment procedure will eliminate need for
> > another
> > > internal replicated cache
> > > and make service deployment more reliable on unstable topology.
> > >
> > > Denis
> > >
> > > вт, 27 мар. 2018 г. в 23:21, Vyacheslav Daradur <[hidden email]>:
> > >
> > >> Hi, Denis Mekhanikov!
> > >>
> > >> As far as I know, Ignite services are based on IgniteCache and we have
> > >> all its features. We can use listeners or continuous queries for
> > >> deployment synchronizations.
> > >>
> > >> Why do you want using the discovery layer for that?
> > >>
> > >> One more thing: we can use baseline approach for services, that means
> > >> *IgniteService.deploy()* returns ready to work service after
> > >> deployment on baseline nodes and deploy to other nodes on demand, for
> > >> example when deployed service's loading will be hight.
> > >>
> > >> About versioning, maybe there is sense to extend public API:
> > >> IgniteServices.service(name, *version*)?
> > >>
> > >> At first deployment, we can compute service's hashcode (just for an
> > >> example) and store it, after new deployment request for services with
> > >> an existing name we will compute new service's hashcode and compare
> > >> them if they have different hashcodes that we will deploy new service
> > >> as service with a different version.
> > >>
> > >>
> > >> On Fri, Mar 23, 2018 at 10:03 PM, Denis Magda <[hidden email]>
> > wrote:
> > >> > Denis,
> > >> >
> > >> > Thanks for the extensive analysis. There is a vast room for
> > optimizations
> > >> > on the service grid side.
> > >> >
> > >> > Yakov, Sam, Alex G.,
> > >> >
> > >> > How do you like the idea of the usage of discovery protocol for the
> > >> service
> > >> > grid system messages exchange? Any pitfalls?
> > >> >
> > >> >
> > >> > --
> > >> > Denis
> > >> >
> > >> >
> > >> > On Fri, Mar 23, 2018 at 8:01 AM, Denis Mekhanikov <
> > [hidden email]
> > >> >
> > >> > wrote:
> > >> >
> > >> >> Igniters,
> > >> >>
> > >> >> I'd like to start a discussion on Ignite service grid redesign.
> > >> >> We have a number of problems in our current architecture, that have
> > to
> > >> be
> > >> >> addressed.
> > >> >>
> > >> >> Here are the most severe ones:
> > >> >>
> > >> >> One of them is lack of guarantee, that service is successfully
> > deployed
> > >> and
> > >> >> ready for work by the time, when *IgniteService.deploy*()* methods
> > >> return.
> > >> >> Furthermore, if an exception is thrown from *Service.init()
> *method,
> > >> then
> > >> >> the deploying side is not able to receive it, or even understand,
> > that
> > >> >> service is in unusable state.
> > >> >> So, you may end up in such situation, when you deployed a service
> > >> without
> > >> >> receiving any errors, then called a service's method, and hung
> > >> indefinitely
> > >> >> on this invocation.
> > >> >> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-3392
> > >> >>
> > >> >> Another problem is locking during service deployment on unstable
> > >> topology.
> > >> >> This issue is caused by missing updates in continuous query
> > listeners on
> > >> >> the internal cache.
> > >> >> It is hard to reproduce, but it happens sometimes. We shouldn't
> allow
> > >> such
> > >> >> possibility, that deployment methods hang without saying anything.
> > >> >> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-6259
> > >> >>
> > >> >> I think, we should change the deployment procedure to make it more
> > >> >> reliable.
> > >> >> Moving from operating over internal replicated service cache to
> > sending
> > >> >> custom discovery events seems to be a good idea.
> > >> >> Service deployment may trigger a discovery event, that will make
> > chosen
> > >> >> nodes deploy the service, and the same event will notify other
> nodes
> > >> about
> > >> >> the deployed service instances.
> > >> >> It will eliminate the need for distributed transactions on the
> > internal
> > >> >> replicated system cache, and make the service deployment protocol
> > more
> > >> >> transparent.
> > >> >>
> > >> >> There are a few points, that should be taken into account though.
> > >> >>
> > >> >> First of all, we can't wait for services to be deployed and
> > initialised
> > >> in
> > >> >> the discovery thread.
> > >> >> So, we need to make notification about service deployment result
> > >> >> asynchronous, presumably over communication protocol.
> > >> >> I can think of a procedure similar to the current exchange
> protocol,
> > >> when
> > >> >> service deployment is initialised with an initial discovery
> message,
> > >> >> followed by asynchronous notifications from the hosting servers
> over
> > >> >> communication. And finally, one more discovery message will notify
> > all
> > >> >> nodes about the service deployment result and location of the
> > deployed
> > >> >> service instances. Coordinator will be responsible for collecting
> of
> > the
> > >> >> deployment results in this scheme.
> > >> >>
> > >> >> Another problem is failover in case, when some nodes fail during
> > >> deployment
> > >> >> or further work.
> > >> >> The following cases should be handled:
> > >> >>
> > >> >>    1. coordinator failure during deployment;
> > >> >>    2. failure of nodes, that were chosen to host the service,
> during
> > >> >>    deployment;
> > >> >>    3. failure of nodes, that contain deployed services, after the
> > >> >>    deployment.
> > >> >>
> > >> >> The first case may be resolved by either continuation of deployment
> > >> with a
> > >> >> new coordinator, or by cancelling it.
> > >> >> The second case will require another node to be chosen and
> notified.
> > >> Maybe
> > >> >> another discovery message will be needed.
> > >> >> The third case will require redeployment, so coordinator should
> track
> > >> >> topology changes and redeploy failed services.
> > >> >>
> > >> >> Another good improvement would be service versioning. This matter
> was
> > >> >> already discussed in another thread:
> > >> >>
> > >>
> > http://apache-ignite-developers.2346864.n4.nabble.
> com/Service-versioning-
> > >> >> td20858.html
> > >> >> Let's resume this discussion and state the final decision here.
> > >> >> This feature is closely connected to peer class loading, which is
> not
> > >> >> working for services currently.
> > >> >> So, service versioning should be implemented along with peer class
> > >> loading.
> > >> >> JIRA ticket for versioning:
> > >> >> https://issues.apache.org/jira/browse/IGNITE-6069
> > >> >> Peer class loading: https://issues.apache.org/
> jira/browse/IGNITE-975
> > >> >>
> > >> >> Please share your thoughts. Constructive criticism is highly
> > >> appreciated.
> > >> >>
> > >> >> Denis
> > >> >>
> > >>
> > >>
> > >>
> > >> --
> > >> Best Regards, Vyacheslav D.
> > >>
> >
> >
> >
> > --
> > Best Regards, Vyacheslav D.
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Service grid redesign

Denis Magda
Sorry, that was me who renamed the IEP to "Oil Change in Service Grid". Was
writing this email after the renaming. Like that title more because it's
fun and highlights what we're intended to do - cleaning of our service grid
engine and powering it up with new "liquid" (new communication and
deployment approach not available before).

Denis


> This message contains serialized service instance and its configuration.
> It is delivered to the coordinator node first, that calculates the service
> deployment assignments and adds this information to the message.


I would consider using a NodeFilter first to decide where a service can be
potentially deployed.  Otherwise, we would require service classes to be on
every node (every node might become a coordinator) which is not the desired
requirement.


As for the peer-class-loading, I would backup up Dmitriy here. Let's at
least not to focus on this task for now. We should design services
versioning in the right way first and support it.

--
Denis



On Wed, Apr 4, 2018 at 12:20 PM, Dmitriy Setrakyan <[hidden email]>
wrote:

> Here is the correct link:
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> 17%3A+Oil+Change+in+Service+Grid
>
> I have looked at the tickets there, and I believe that we should not
> support peer-deployment for services. It is very hard and I do not think we
> should even try.
>
> I am proposing closing this ticket as Won't Fix -
> https://issues.apache.org/jira/browse/IGNITE-975
>
> D.
>
> On Wed, Apr 4, 2018 at 5:39 AM, Denis Mekhanikov <[hidden email]>
> wrote:
>
> > Vyacheslav,
> >
> > I've just posted my first draft of the IEP:
> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> 17%3A+Service+grid+
> > improvements
> > It's not finished yet, but you can get the idea from it.
> > If you have some thoughts on your mind, please let me know, I'll add them
> > to the IEP.
> >
> > Denis
> >
> > ср, 4 апр. 2018 г. в 13:09, Vyacheslav Daradur <[hidden email]>:
> >
> > > Denis, thanks for the link.
> > >
> > > I looked through the task and I think that understand your redesign
> point
> > > now.
> > >
> > > Do you have a clear plan or IEP for the whole redesign?
> > >
> > > I'm interested in this component and I'd like to take part in the
> > > development.
> > >
> > >
> > >
> > > On Mon, Apr 2, 2018 at 2:55 PM, Denis Mekhanikov <
> [hidden email]>
> > > wrote:
> > > > Vyacheslav,
> > > >
> > > > Service deployment design, based on replicated utility cache has
> proven
> > > to
> > > > be unstable and deadlock-prone.
> > > > You can find a list of JIRA issues, connected to it, in my previous
> > > letter.
> > > >
> > > > The intention behind it is similar to the binary metadata redesign,
> > that
> > > > happened in the following ticket: IGNITE-4157
> > > > <https://issues.apache.org/jira/browse/IGNITE-4157>
> > > > This change in service deployment procedure will eliminate need for
> > > another
> > > > internal replicated cache
> > > > and make service deployment more reliable on unstable topology.
> > > >
> > > > Denis
> > > >
> > > > вт, 27 мар. 2018 г. в 23:21, Vyacheslav Daradur <[hidden email]
> >:
> > > >
> > > >> Hi, Denis Mekhanikov!
> > > >>
> > > >> As far as I know, Ignite services are based on IgniteCache and we
> have
> > > >> all its features. We can use listeners or continuous queries for
> > > >> deployment synchronizations.
> > > >>
> > > >> Why do you want using the discovery layer for that?
> > > >>
> > > >> One more thing: we can use baseline approach for services, that
> means
> > > >> *IgniteService.deploy()* returns ready to work service after
> > > >> deployment on baseline nodes and deploy to other nodes on demand,
> for
> > > >> example when deployed service's loading will be hight.
> > > >>
> > > >> About versioning, maybe there is sense to extend public API:
> > > >> IgniteServices.service(name, *version*)?
> > > >>
> > > >> At first deployment, we can compute service's hashcode (just for an
> > > >> example) and store it, after new deployment request for services
> with
> > > >> an existing name we will compute new service's hashcode and compare
> > > >> them if they have different hashcodes that we will deploy new
> service
> > > >> as service with a different version.
> > > >>
> > > >>
> > > >> On Fri, Mar 23, 2018 at 10:03 PM, Denis Magda <[hidden email]>
> > > wrote:
> > > >> > Denis,
> > > >> >
> > > >> > Thanks for the extensive analysis. There is a vast room for
> > > optimizations
> > > >> > on the service grid side.
> > > >> >
> > > >> > Yakov, Sam, Alex G.,
> > > >> >
> > > >> > How do you like the idea of the usage of discovery protocol for
> the
> > > >> service
> > > >> > grid system messages exchange? Any pitfalls?
> > > >> >
> > > >> >
> > > >> > --
> > > >> > Denis
> > > >> >
> > > >> >
> > > >> > On Fri, Mar 23, 2018 at 8:01 AM, Denis Mekhanikov <
> > > [hidden email]
> > > >> >
> > > >> > wrote:
> > > >> >
> > > >> >> Igniters,
> > > >> >>
> > > >> >> I'd like to start a discussion on Ignite service grid redesign.
> > > >> >> We have a number of problems in our current architecture, that
> have
> > > to
> > > >> be
> > > >> >> addressed.
> > > >> >>
> > > >> >> Here are the most severe ones:
> > > >> >>
> > > >> >> One of them is lack of guarantee, that service is successfully
> > > deployed
> > > >> and
> > > >> >> ready for work by the time, when *IgniteService.deploy*()*
> methods
> > > >> return.
> > > >> >> Furthermore, if an exception is thrown from *Service.init()
> > *method,
> > > >> then
> > > >> >> the deploying side is not able to receive it, or even understand,
> > > that
> > > >> >> service is in unusable state.
> > > >> >> So, you may end up in such situation, when you deployed a service
> > > >> without
> > > >> >> receiving any errors, then called a service's method, and hung
> > > >> indefinitely
> > > >> >> on this invocation.
> > > >> >> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-3392
> > > >> >>
> > > >> >> Another problem is locking during service deployment on unstable
> > > >> topology.
> > > >> >> This issue is caused by missing updates in continuous query
> > > listeners on
> > > >> >> the internal cache.
> > > >> >> It is hard to reproduce, but it happens sometimes. We shouldn't
> > allow
> > > >> such
> > > >> >> possibility, that deployment methods hang without saying
> anything.
> > > >> >> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-6259
> > > >> >>
> > > >> >> I think, we should change the deployment procedure to make it
> more
> > > >> >> reliable.
> > > >> >> Moving from operating over internal replicated service cache to
> > > sending
> > > >> >> custom discovery events seems to be a good idea.
> > > >> >> Service deployment may trigger a discovery event, that will make
> > > chosen
> > > >> >> nodes deploy the service, and the same event will notify other
> > nodes
> > > >> about
> > > >> >> the deployed service instances.
> > > >> >> It will eliminate the need for distributed transactions on the
> > > internal
> > > >> >> replicated system cache, and make the service deployment protocol
> > > more
> > > >> >> transparent.
> > > >> >>
> > > >> >> There are a few points, that should be taken into account though.
> > > >> >>
> > > >> >> First of all, we can't wait for services to be deployed and
> > > initialised
> > > >> in
> > > >> >> the discovery thread.
> > > >> >> So, we need to make notification about service deployment result
> > > >> >> asynchronous, presumably over communication protocol.
> > > >> >> I can think of a procedure similar to the current exchange
> > protocol,
> > > >> when
> > > >> >> service deployment is initialised with an initial discovery
> > message,
> > > >> >> followed by asynchronous notifications from the hosting servers
> > over
> > > >> >> communication. And finally, one more discovery message will
> notify
> > > all
> > > >> >> nodes about the service deployment result and location of the
> > > deployed
> > > >> >> service instances. Coordinator will be responsible for collecting
> > of
> > > the
> > > >> >> deployment results in this scheme.
> > > >> >>
> > > >> >> Another problem is failover in case, when some nodes fail during
> > > >> deployment
> > > >> >> or further work.
> > > >> >> The following cases should be handled:
> > > >> >>
> > > >> >>    1. coordinator failure during deployment;
> > > >> >>    2. failure of nodes, that were chosen to host the service,
> > during
> > > >> >>    deployment;
> > > >> >>    3. failure of nodes, that contain deployed services, after the
> > > >> >>    deployment.
> > > >> >>
> > > >> >> The first case may be resolved by either continuation of
> deployment
> > > >> with a
> > > >> >> new coordinator, or by cancelling it.
> > > >> >> The second case will require another node to be chosen and
> > notified.
> > > >> Maybe
> > > >> >> another discovery message will be needed.
> > > >> >> The third case will require redeployment, so coordinator should
> > track
> > > >> >> topology changes and redeploy failed services.
> > > >> >>
> > > >> >> Another good improvement would be service versioning. This matter
> > was
> > > >> >> already discussed in another thread:
> > > >> >>
> > > >>
> > > http://apache-ignite-developers.2346864.n4.nabble.
> > com/Service-versioning-
> > > >> >> td20858.html
> > > >> >> Let's resume this discussion and state the final decision here.
> > > >> >> This feature is closely connected to peer class loading, which is
> > not
> > > >> >> working for services currently.
> > > >> >> So, service versioning should be implemented along with peer
> class
> > > >> loading.
> > > >> >> JIRA ticket for versioning:
> > > >> >> https://issues.apache.org/jira/browse/IGNITE-6069
> > > >> >> Peer class loading: https://issues.apache.org/
> > jira/browse/IGNITE-975
> > > >> >>
> > > >> >> Please share your thoughts. Constructive criticism is highly
> > > >> appreciated.
> > > >> >>
> > > >> >> Denis
> > > >> >>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Best Regards, Vyacheslav D.
> > > >>
> > >
> > >
> > >
> > > --
> > > Best Regards, Vyacheslav D.
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Service grid redesign

Valentin Kulichenko
I don't think peer class loading is even possible for services. I believe
we should reuse DeploymentSpi [1] for versioning.

[1] https://apacheignite.readme.io/docs/deployment-spi

-Val

On Wed, Apr 4, 2018 at 12:52 PM, Denis Magda <[hidden email]> wrote:

> Sorry, that was me who renamed the IEP to "Oil Change in Service Grid". Was
> writing this email after the renaming. Like that title more because it's
> fun and highlights what we're intended to do - cleaning of our service grid
> engine and powering it up with new "liquid" (new communication and
> deployment approach not available before).
>
> Denis
>
>
> > This message contains serialized service instance and its configuration.
> > It is delivered to the coordinator node first, that calculates the
> service
> > deployment assignments and adds this information to the message.
>
>
> I would consider using a NodeFilter first to decide where a service can be
> potentially deployed.  Otherwise, we would require service classes to be on
> every node (every node might become a coordinator) which is not the desired
> requirement.
>
>
> As for the peer-class-loading, I would backup up Dmitriy here. Let's at
> least not to focus on this task for now. We should design services
> versioning in the right way first and support it.
>
> --
> Denis
>
>
>
> On Wed, Apr 4, 2018 at 12:20 PM, Dmitriy Setrakyan <[hidden email]>
> wrote:
>
> > Here is the correct link:
> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > 17%3A+Oil+Change+in+Service+Grid
> >
> > I have looked at the tickets there, and I believe that we should not
> > support peer-deployment for services. It is very hard and I do not think
> we
> > should even try.
> >
> > I am proposing closing this ticket as Won't Fix -
> > https://issues.apache.org/jira/browse/IGNITE-975
> >
> > D.
> >
> > On Wed, Apr 4, 2018 at 5:39 AM, Denis Mekhanikov <[hidden email]>
> > wrote:
> >
> > > Vyacheslav,
> > >
> > > I've just posted my first draft of the IEP:
> > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > 17%3A+Service+grid+
> > > improvements
> > > It's not finished yet, but you can get the idea from it.
> > > If you have some thoughts on your mind, please let me know, I'll add
> them
> > > to the IEP.
> > >
> > > Denis
> > >
> > > ср, 4 апр. 2018 г. в 13:09, Vyacheslav Daradur <[hidden email]>:
> > >
> > > > Denis, thanks for the link.
> > > >
> > > > I looked through the task and I think that understand your redesign
> > point
> > > > now.
> > > >
> > > > Do you have a clear plan or IEP for the whole redesign?
> > > >
> > > > I'm interested in this component and I'd like to take part in the
> > > > development.
> > > >
> > > >
> > > >
> > > > On Mon, Apr 2, 2018 at 2:55 PM, Denis Mekhanikov <
> > [hidden email]>
> > > > wrote:
> > > > > Vyacheslav,
> > > > >
> > > > > Service deployment design, based on replicated utility cache has
> > proven
> > > > to
> > > > > be unstable and deadlock-prone.
> > > > > You can find a list of JIRA issues, connected to it, in my previous
> > > > letter.
> > > > >
> > > > > The intention behind it is similar to the binary metadata redesign,
> > > that
> > > > > happened in the following ticket: IGNITE-4157
> > > > > <https://issues.apache.org/jira/browse/IGNITE-4157>
> > > > > This change in service deployment procedure will eliminate need for
> > > > another
> > > > > internal replicated cache
> > > > > and make service deployment more reliable on unstable topology.
> > > > >
> > > > > Denis
> > > > >
> > > > > вт, 27 мар. 2018 г. в 23:21, Vyacheslav Daradur <
> [hidden email]
> > >:
> > > > >
> > > > >> Hi, Denis Mekhanikov!
> > > > >>
> > > > >> As far as I know, Ignite services are based on IgniteCache and we
> > have
> > > > >> all its features. We can use listeners or continuous queries for
> > > > >> deployment synchronizations.
> > > > >>
> > > > >> Why do you want using the discovery layer for that?
> > > > >>
> > > > >> One more thing: we can use baseline approach for services, that
> > means
> > > > >> *IgniteService.deploy()* returns ready to work service after
> > > > >> deployment on baseline nodes and deploy to other nodes on demand,
> > for
> > > > >> example when deployed service's loading will be hight.
> > > > >>
> > > > >> About versioning, maybe there is sense to extend public API:
> > > > >> IgniteServices.service(name, *version*)?
> > > > >>
> > > > >> At first deployment, we can compute service's hashcode (just for
> an
> > > > >> example) and store it, after new deployment request for services
> > with
> > > > >> an existing name we will compute new service's hashcode and
> compare
> > > > >> them if they have different hashcodes that we will deploy new
> > service
> > > > >> as service with a different version.
> > > > >>
> > > > >>
> > > > >> On Fri, Mar 23, 2018 at 10:03 PM, Denis Magda <[hidden email]>
> > > > wrote:
> > > > >> > Denis,
> > > > >> >
> > > > >> > Thanks for the extensive analysis. There is a vast room for
> > > > optimizations
> > > > >> > on the service grid side.
> > > > >> >
> > > > >> > Yakov, Sam, Alex G.,
> > > > >> >
> > > > >> > How do you like the idea of the usage of discovery protocol for
> > the
> > > > >> service
> > > > >> > grid system messages exchange? Any pitfalls?
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> > Denis
> > > > >> >
> > > > >> >
> > > > >> > On Fri, Mar 23, 2018 at 8:01 AM, Denis Mekhanikov <
> > > > [hidden email]
> > > > >> >
> > > > >> > wrote:
> > > > >> >
> > > > >> >> Igniters,
> > > > >> >>
> > > > >> >> I'd like to start a discussion on Ignite service grid redesign.
> > > > >> >> We have a number of problems in our current architecture, that
> > have
> > > > to
> > > > >> be
> > > > >> >> addressed.
> > > > >> >>
> > > > >> >> Here are the most severe ones:
> > > > >> >>
> > > > >> >> One of them is lack of guarantee, that service is successfully
> > > > deployed
> > > > >> and
> > > > >> >> ready for work by the time, when *IgniteService.deploy*()*
> > methods
> > > > >> return.
> > > > >> >> Furthermore, if an exception is thrown from *Service.init()
> > > *method,
> > > > >> then
> > > > >> >> the deploying side is not able to receive it, or even
> understand,
> > > > that
> > > > >> >> service is in unusable state.
> > > > >> >> So, you may end up in such situation, when you deployed a
> service
> > > > >> without
> > > > >> >> receiving any errors, then called a service's method, and hung
> > > > >> indefinitely
> > > > >> >> on this invocation.
> > > > >> >> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-3392
> > > > >> >>
> > > > >> >> Another problem is locking during service deployment on
> unstable
> > > > >> topology.
> > > > >> >> This issue is caused by missing updates in continuous query
> > > > listeners on
> > > > >> >> the internal cache.
> > > > >> >> It is hard to reproduce, but it happens sometimes. We shouldn't
> > > allow
> > > > >> such
> > > > >> >> possibility, that deployment methods hang without saying
> > anything.
> > > > >> >> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-6259
> > > > >> >>
> > > > >> >> I think, we should change the deployment procedure to make it
> > more
> > > > >> >> reliable.
> > > > >> >> Moving from operating over internal replicated service cache to
> > > > sending
> > > > >> >> custom discovery events seems to be a good idea.
> > > > >> >> Service deployment may trigger a discovery event, that will
> make
> > > > chosen
> > > > >> >> nodes deploy the service, and the same event will notify other
> > > nodes
> > > > >> about
> > > > >> >> the deployed service instances.
> > > > >> >> It will eliminate the need for distributed transactions on the
> > > > internal
> > > > >> >> replicated system cache, and make the service deployment
> protocol
> > > > more
> > > > >> >> transparent.
> > > > >> >>
> > > > >> >> There are a few points, that should be taken into account
> though.
> > > > >> >>
> > > > >> >> First of all, we can't wait for services to be deployed and
> > > > initialised
> > > > >> in
> > > > >> >> the discovery thread.
> > > > >> >> So, we need to make notification about service deployment
> result
> > > > >> >> asynchronous, presumably over communication protocol.
> > > > >> >> I can think of a procedure similar to the current exchange
> > > protocol,
> > > > >> when
> > > > >> >> service deployment is initialised with an initial discovery
> > > message,
> > > > >> >> followed by asynchronous notifications from the hosting servers
> > > over
> > > > >> >> communication. And finally, one more discovery message will
> > notify
> > > > all
> > > > >> >> nodes about the service deployment result and location of the
> > > > deployed
> > > > >> >> service instances. Coordinator will be responsible for
> collecting
> > > of
> > > > the
> > > > >> >> deployment results in this scheme.
> > > > >> >>
> > > > >> >> Another problem is failover in case, when some nodes fail
> during
> > > > >> deployment
> > > > >> >> or further work.
> > > > >> >> The following cases should be handled:
> > > > >> >>
> > > > >> >>    1. coordinator failure during deployment;
> > > > >> >>    2. failure of nodes, that were chosen to host the service,
> > > during
> > > > >> >>    deployment;
> > > > >> >>    3. failure of nodes, that contain deployed services, after
> the
> > > > >> >>    deployment.
> > > > >> >>
> > > > >> >> The first case may be resolved by either continuation of
> > deployment
> > > > >> with a
> > > > >> >> new coordinator, or by cancelling it.
> > > > >> >> The second case will require another node to be chosen and
> > > notified.
> > > > >> Maybe
> > > > >> >> another discovery message will be needed.
> > > > >> >> The third case will require redeployment, so coordinator should
> > > track
> > > > >> >> topology changes and redeploy failed services.
> > > > >> >>
> > > > >> >> Another good improvement would be service versioning. This
> matter
> > > was
> > > > >> >> already discussed in another thread:
> > > > >> >>
> > > > >>
> > > > http://apache-ignite-developers.2346864.n4.nabble.
> > > com/Service-versioning-
> > > > >> >> td20858.html
> > > > >> >> Let's resume this discussion and state the final decision here.
> > > > >> >> This feature is closely connected to peer class loading, which
> is
> > > not
> > > > >> >> working for services currently.
> > > > >> >> So, service versioning should be implemented along with peer
> > class
> > > > >> loading.
> > > > >> >> JIRA ticket for versioning:
> > > > >> >> https://issues.apache.org/jira/browse/IGNITE-6069
> > > > >> >> Peer class loading: https://issues.apache.org/
> > > jira/browse/IGNITE-975
> > > > >> >>
> > > > >> >> Please share your thoughts. Constructive criticism is highly
> > > > >> appreciated.
> > > > >> >>
> > > > >> >> Denis
> > > > >> >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Best Regards, Vyacheslav D.
> > > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards, Vyacheslav D.
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Service grid redesign

Denis Mekhanikov
Denis,
There is no need to deserialize services on the coordinator. It should only
be able to calculate the assignments.
*LazyServiceConfiguration *should be used to deliver the service
configurations, just like it is done right now.

Val,
Usage of DeploymentSpi is a good idea, I didn't think about this
possibility.
This is a viable alternative to peer-class-loading, not that user-friendly
though.
But if peer-class-loading is that hard to implement, then I vote for
DeploymentSpi.
As far as I understand, it won't require us to do any additional changes in
Ignite, but will make users think about using a proper DeploymentSpi.
Please correct me, if I'm wrong.
It would be good, though, to add some examples on service redeployment,
when implementation class changes.

Denis

чт, 5 апр. 2018 г. в 2:33, Valentin Kulichenko <
[hidden email]>:

> I don't think peer class loading is even possible for services. I believe
> we should reuse DeploymentSpi [1] for versioning.
>
> [1] https://apacheignite.readme.io/docs/deployment-spi
>
> -Val
>
> On Wed, Apr 4, 2018 at 12:52 PM, Denis Magda <[hidden email]> wrote:
>
> > Sorry, that was me who renamed the IEP to "Oil Change in Service Grid".
> Was
> > writing this email after the renaming. Like that title more because it's
> > fun and highlights what we're intended to do - cleaning of our service
> grid
> > engine and powering it up with new "liquid" (new communication and
> > deployment approach not available before).
> >
> > Denis
> >
> >
> > > This message contains serialized service instance and its
> configuration.
> > > It is delivered to the coordinator node first, that calculates the
> > service
> > > deployment assignments and adds this information to the message.
> >
> >
> > I would consider using a NodeFilter first to decide where a service can
> be
> > potentially deployed.  Otherwise, we would require service classes to be
> on
> > every node (every node might become a coordinator) which is not the
> desired
> > requirement.
> >
> >
> > As for the peer-class-loading, I would backup up Dmitriy here. Let's at
> > least not to focus on this task for now. We should design services
> > versioning in the right way first and support it.
> >
> > --
> > Denis
> >
> >
> >
> > On Wed, Apr 4, 2018 at 12:20 PM, Dmitriy Setrakyan <
> [hidden email]>
> > wrote:
> >
> > > Here is the correct link:
> > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > > 17%3A+Oil+Change+in+Service+Grid
> > >
> > > I have looked at the tickets there, and I believe that we should not
> > > support peer-deployment for services. It is very hard and I do not
> think
> > we
> > > should even try.
> > >
> > > I am proposing closing this ticket as Won't Fix -
> > > https://issues.apache.org/jira/browse/IGNITE-975
> > >
> > > D.
> > >
> > > On Wed, Apr 4, 2018 at 5:39 AM, Denis Mekhanikov <
> [hidden email]>
> > > wrote:
> > >
> > > > Vyacheslav,
> > > >
> > > > I've just posted my first draft of the IEP:
> > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > > 17%3A+Service+grid+
> > > > improvements
> > > > It's not finished yet, but you can get the idea from it.
> > > > If you have some thoughts on your mind, please let me know, I'll add
> > them
> > > > to the IEP.
> > > >
> > > > Denis
> > > >
> > > > ср, 4 апр. 2018 г. в 13:09, Vyacheslav Daradur <[hidden email]
> >:
> > > >
> > > > > Denis, thanks for the link.
> > > > >
> > > > > I looked through the task and I think that understand your redesign
> > > point
> > > > > now.
> > > > >
> > > > > Do you have a clear plan or IEP for the whole redesign?
> > > > >
> > > > > I'm interested in this component and I'd like to take part in the
> > > > > development.
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Apr 2, 2018 at 2:55 PM, Denis Mekhanikov <
> > > [hidden email]>
> > > > > wrote:
> > > > > > Vyacheslav,
> > > > > >
> > > > > > Service deployment design, based on replicated utility cache has
> > > proven
> > > > > to
> > > > > > be unstable and deadlock-prone.
> > > > > > You can find a list of JIRA issues, connected to it, in my
> previous
> > > > > letter.
> > > > > >
> > > > > > The intention behind it is similar to the binary metadata
> redesign,
> > > > that
> > > > > > happened in the following ticket: IGNITE-4157
> > > > > > <https://issues.apache.org/jira/browse/IGNITE-4157>
> > > > > > This change in service deployment procedure will eliminate need
> for
> > > > > another
> > > > > > internal replicated cache
> > > > > > and make service deployment more reliable on unstable topology.
> > > > > >
> > > > > > Denis
> > > > > >
> > > > > > вт, 27 мар. 2018 г. в 23:21, Vyacheslav Daradur <
> > [hidden email]
> > > >:
> > > > > >
> > > > > >> Hi, Denis Mekhanikov!
> > > > > >>
> > > > > >> As far as I know, Ignite services are based on IgniteCache and
> we
> > > have
> > > > > >> all its features. We can use listeners or continuous queries for
> > > > > >> deployment synchronizations.
> > > > > >>
> > > > > >> Why do you want using the discovery layer for that?
> > > > > >>
> > > > > >> One more thing: we can use baseline approach for services, that
> > > means
> > > > > >> *IgniteService.deploy()* returns ready to work service after
> > > > > >> deployment on baseline nodes and deploy to other nodes on
> demand,
> > > for
> > > > > >> example when deployed service's loading will be hight.
> > > > > >>
> > > > > >> About versioning, maybe there is sense to extend public API:
> > > > > >> IgniteServices.service(name, *version*)?
> > > > > >>
> > > > > >> At first deployment, we can compute service's hashcode (just for
> > an
> > > > > >> example) and store it, after new deployment request for services
> > > with
> > > > > >> an existing name we will compute new service's hashcode and
> > compare
> > > > > >> them if they have different hashcodes that we will deploy new
> > > service
> > > > > >> as service with a different version.
> > > > > >>
> > > > > >>
> > > > > >> On Fri, Mar 23, 2018 at 10:03 PM, Denis Magda <
> [hidden email]>
> > > > > wrote:
> > > > > >> > Denis,
> > > > > >> >
> > > > > >> > Thanks for the extensive analysis. There is a vast room for
> > > > > optimizations
> > > > > >> > on the service grid side.
> > > > > >> >
> > > > > >> > Yakov, Sam, Alex G.,
> > > > > >> >
> > > > > >> > How do you like the idea of the usage of discovery protocol
> for
> > > the
> > > > > >> service
> > > > > >> > grid system messages exchange? Any pitfalls?
> > > > > >> >
> > > > > >> >
> > > > > >> > --
> > > > > >> > Denis
> > > > > >> >
> > > > > >> >
> > > > > >> > On Fri, Mar 23, 2018 at 8:01 AM, Denis Mekhanikov <
> > > > > [hidden email]
> > > > > >> >
> > > > > >> > wrote:
> > > > > >> >
> > > > > >> >> Igniters,
> > > > > >> >>
> > > > > >> >> I'd like to start a discussion on Ignite service grid
> redesign.
> > > > > >> >> We have a number of problems in our current architecture,
> that
> > > have
> > > > > to
> > > > > >> be
> > > > > >> >> addressed.
> > > > > >> >>
> > > > > >> >> Here are the most severe ones:
> > > > > >> >>
> > > > > >> >> One of them is lack of guarantee, that service is
> successfully
> > > > > deployed
> > > > > >> and
> > > > > >> >> ready for work by the time, when *IgniteService.deploy*()*
> > > methods
> > > > > >> return.
> > > > > >> >> Furthermore, if an exception is thrown from *Service.init()
> > > > *method,
> > > > > >> then
> > > > > >> >> the deploying side is not able to receive it, or even
> > understand,
> > > > > that
> > > > > >> >> service is in unusable state.
> > > > > >> >> So, you may end up in such situation, when you deployed a
> > service
> > > > > >> without
> > > > > >> >> receiving any errors, then called a service's method, and
> hung
> > > > > >> indefinitely
> > > > > >> >> on this invocation.
> > > > > >> >> JIRA ticket:
> https://issues.apache.org/jira/browse/IGNITE-3392
> > > > > >> >>
> > > > > >> >> Another problem is locking during service deployment on
> > unstable
> > > > > >> topology.
> > > > > >> >> This issue is caused by missing updates in continuous query
> > > > > listeners on
> > > > > >> >> the internal cache.
> > > > > >> >> It is hard to reproduce, but it happens sometimes. We
> shouldn't
> > > > allow
> > > > > >> such
> > > > > >> >> possibility, that deployment methods hang without saying
> > > anything.
> > > > > >> >> JIRA ticket:
> https://issues.apache.org/jira/browse/IGNITE-6259
> > > > > >> >>
> > > > > >> >> I think, we should change the deployment procedure to make it
> > > more
> > > > > >> >> reliable.
> > > > > >> >> Moving from operating over internal replicated service cache
> to
> > > > > sending
> > > > > >> >> custom discovery events seems to be a good idea.
> > > > > >> >> Service deployment may trigger a discovery event, that will
> > make
> > > > > chosen
> > > > > >> >> nodes deploy the service, and the same event will notify
> other
> > > > nodes
> > > > > >> about
> > > > > >> >> the deployed service instances.
> > > > > >> >> It will eliminate the need for distributed transactions on
> the
> > > > > internal
> > > > > >> >> replicated system cache, and make the service deployment
> > protocol
> > > > > more
> > > > > >> >> transparent.
> > > > > >> >>
> > > > > >> >> There are a few points, that should be taken into account
> > though.
> > > > > >> >>
> > > > > >> >> First of all, we can't wait for services to be deployed and
> > > > > initialised
> > > > > >> in
> > > > > >> >> the discovery thread.
> > > > > >> >> So, we need to make notification about service deployment
> > result
> > > > > >> >> asynchronous, presumably over communication protocol.
> > > > > >> >> I can think of a procedure similar to the current exchange
> > > > protocol,
> > > > > >> when
> > > > > >> >> service deployment is initialised with an initial discovery
> > > > message,
> > > > > >> >> followed by asynchronous notifications from the hosting
> servers
> > > > over
> > > > > >> >> communication. And finally, one more discovery message will
> > > notify
> > > > > all
> > > > > >> >> nodes about the service deployment result and location of the
> > > > > deployed
> > > > > >> >> service instances. Coordinator will be responsible for
> > collecting
> > > > of
> > > > > the
> > > > > >> >> deployment results in this scheme.
> > > > > >> >>
> > > > > >> >> Another problem is failover in case, when some nodes fail
> > during
> > > > > >> deployment
> > > > > >> >> or further work.
> > > > > >> >> The following cases should be handled:
> > > > > >> >>
> > > > > >> >>    1. coordinator failure during deployment;
> > > > > >> >>    2. failure of nodes, that were chosen to host the service,
> > > > during
> > > > > >> >>    deployment;
> > > > > >> >>    3. failure of nodes, that contain deployed services, after
> > the
> > > > > >> >>    deployment.
> > > > > >> >>
> > > > > >> >> The first case may be resolved by either continuation of
> > > deployment
> > > > > >> with a
> > > > > >> >> new coordinator, or by cancelling it.
> > > > > >> >> The second case will require another node to be chosen and
> > > > notified.
> > > > > >> Maybe
> > > > > >> >> another discovery message will be needed.
> > > > > >> >> The third case will require redeployment, so coordinator
> should
> > > > track
> > > > > >> >> topology changes and redeploy failed services.
> > > > > >> >>
> > > > > >> >> Another good improvement would be service versioning. This
> > matter
> > > > was
> > > > > >> >> already discussed in another thread:
> > > > > >> >>
> > > > > >>
> > > > > http://apache-ignite-developers.2346864.n4.nabble.
> > > > com/Service-versioning-
> > > > > >> >> td20858.html
> > > > > >> >> Let's resume this discussion and state the final decision
> here.
> > > > > >> >> This feature is closely connected to peer class loading,
> which
> > is
> > > > not
> > > > > >> >> working for services currently.
> > > > > >> >> So, service versioning should be implemented along with peer
> > > class
> > > > > >> loading.
> > > > > >> >> JIRA ticket for versioning:
> > > > > >> >> https://issues.apache.org/jira/browse/IGNITE-6069
> > > > > >> >> Peer class loading: https://issues.apache.org/
> > > > jira/browse/IGNITE-975
> > > > > >> >>
> > > > > >> >> Please share your thoughts. Constructive criticism is highly
> > > > > >> appreciated.
> > > > > >> >>
> > > > > >> >> Denis
> > > > > >> >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >> Best Regards, Vyacheslav D.
> > > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best Regards, Vyacheslav D.
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Service grid redesign

Denis Magda
>
> There is no need to deserialize services on the coordinator. It should only
> be able to calculate the assignments.
> *LazyServiceConfiguration *should be used to deliver the service
> configurations, just like it is done right now.


Can that configuration be tweaked over the time requiring to update the
class on all the nodes (if, for instance, someone wants to deploy the next
version of a service)? Just want to be sure we don't need to restart the
cluster nodes (that won't be used for service deployments) on
services-related configurational changes.

--
Denis

On Thu, Apr 5, 2018 at 8:18 AM, Denis Mekhanikov <[hidden email]>
wrote:

> Denis,
> There is no need to deserialize services on the coordinator. It should only
> be able to calculate the assignments.
> *LazyServiceConfiguration *should be used to deliver the service
> configurations, just like it is done right now.
>
> Val,
> Usage of DeploymentSpi is a good idea, I didn't think about this
> possibility.
> This is a viable alternative to peer-class-loading, not that user-friendly
> though.
> But if peer-class-loading is that hard to implement, then I vote for
> DeploymentSpi.
> As far as I understand, it won't require us to do any additional changes in
> Ignite, but will make users think about using a proper DeploymentSpi.
> Please correct me, if I'm wrong.
> It would be good, though, to add some examples on service redeployment,
> when implementation class changes.
>
> Denis
>
> чт, 5 апр. 2018 г. в 2:33, Valentin Kulichenko <
> [hidden email]>:
>
> > I don't think peer class loading is even possible for services. I believe
> > we should reuse DeploymentSpi [1] for versioning.
> >
> > [1] https://apacheignite.readme.io/docs/deployment-spi
> >
> > -Val
> >
> > On Wed, Apr 4, 2018 at 12:52 PM, Denis Magda <[hidden email]>
> wrote:
> >
> > > Sorry, that was me who renamed the IEP to "Oil Change in Service Grid".
> > Was
> > > writing this email after the renaming. Like that title more because
> it's
> > > fun and highlights what we're intended to do - cleaning of our service
> > grid
> > > engine and powering it up with new "liquid" (new communication and
> > > deployment approach not available before).
> > >
> > > Denis
> > >
> > >
> > > > This message contains serialized service instance and its
> > configuration.
> > > > It is delivered to the coordinator node first, that calculates the
> > > service
> > > > deployment assignments and adds this information to the message.
> > >
> > >
> > > I would consider using a NodeFilter first to decide where a service can
> > be
> > > potentially deployed.  Otherwise, we would require service classes to
> be
> > on
> > > every node (every node might become a coordinator) which is not the
> > desired
> > > requirement.
> > >
> > >
> > > As for the peer-class-loading, I would backup up Dmitriy here. Let's at
> > > least not to focus on this task for now. We should design services
> > > versioning in the right way first and support it.
> > >
> > > --
> > > Denis
> > >
> > >
> > >
> > > On Wed, Apr 4, 2018 at 12:20 PM, Dmitriy Setrakyan <
> > [hidden email]>
> > > wrote:
> > >
> > > > Here is the correct link:
> > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > > > 17%3A+Oil+Change+in+Service+Grid
> > > >
> > > > I have looked at the tickets there, and I believe that we should not
> > > > support peer-deployment for services. It is very hard and I do not
> > think
> > > we
> > > > should even try.
> > > >
> > > > I am proposing closing this ticket as Won't Fix -
> > > > https://issues.apache.org/jira/browse/IGNITE-975
> > > >
> > > > D.
> > > >
> > > > On Wed, Apr 4, 2018 at 5:39 AM, Denis Mekhanikov <
> > [hidden email]>
> > > > wrote:
> > > >
> > > > > Vyacheslav,
> > > > >
> > > > > I've just posted my first draft of the IEP:
> > > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > > > 17%3A+Service+grid+
> > > > > improvements
> > > > > It's not finished yet, but you can get the idea from it.
> > > > > If you have some thoughts on your mind, please let me know, I'll
> add
> > > them
> > > > > to the IEP.
> > > > >
> > > > > Denis
> > > > >
> > > > > ср, 4 апр. 2018 г. в 13:09, Vyacheslav Daradur <
> [hidden email]
> > >:
> > > > >
> > > > > > Denis, thanks for the link.
> > > > > >
> > > > > > I looked through the task and I think that understand your
> redesign
> > > > point
> > > > > > now.
> > > > > >
> > > > > > Do you have a clear plan or IEP for the whole redesign?
> > > > > >
> > > > > > I'm interested in this component and I'd like to take part in the
> > > > > > development.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Apr 2, 2018 at 2:55 PM, Denis Mekhanikov <
> > > > [hidden email]>
> > > > > > wrote:
> > > > > > > Vyacheslav,
> > > > > > >
> > > > > > > Service deployment design, based on replicated utility cache
> has
> > > > proven
> > > > > > to
> > > > > > > be unstable and deadlock-prone.
> > > > > > > You can find a list of JIRA issues, connected to it, in my
> > previous
> > > > > > letter.
> > > > > > >
> > > > > > > The intention behind it is similar to the binary metadata
> > redesign,
> > > > > that
> > > > > > > happened in the following ticket: IGNITE-4157
> > > > > > > <https://issues.apache.org/jira/browse/IGNITE-4157>
> > > > > > > This change in service deployment procedure will eliminate need
> > for
> > > > > > another
> > > > > > > internal replicated cache
> > > > > > > and make service deployment more reliable on unstable topology.
> > > > > > >
> > > > > > > Denis
> > > > > > >
> > > > > > > вт, 27 мар. 2018 г. в 23:21, Vyacheslav Daradur <
> > > [hidden email]
> > > > >:
> > > > > > >
> > > > > > >> Hi, Denis Mekhanikov!
> > > > > > >>
> > > > > > >> As far as I know, Ignite services are based on IgniteCache and
> > we
> > > > have
> > > > > > >> all its features. We can use listeners or continuous queries
> for
> > > > > > >> deployment synchronizations.
> > > > > > >>
> > > > > > >> Why do you want using the discovery layer for that?
> > > > > > >>
> > > > > > >> One more thing: we can use baseline approach for services,
> that
> > > > means
> > > > > > >> *IgniteService.deploy()* returns ready to work service after
> > > > > > >> deployment on baseline nodes and deploy to other nodes on
> > demand,
> > > > for
> > > > > > >> example when deployed service's loading will be hight.
> > > > > > >>
> > > > > > >> About versioning, maybe there is sense to extend public API:
> > > > > > >> IgniteServices.service(name, *version*)?
> > > > > > >>
> > > > > > >> At first deployment, we can compute service's hashcode (just
> for
> > > an
> > > > > > >> example) and store it, after new deployment request for
> services
> > > > with
> > > > > > >> an existing name we will compute new service's hashcode and
> > > compare
> > > > > > >> them if they have different hashcodes that we will deploy new
> > > > service
> > > > > > >> as service with a different version.
> > > > > > >>
> > > > > > >>
> > > > > > >> On Fri, Mar 23, 2018 at 10:03 PM, Denis Magda <
> > [hidden email]>
> > > > > > wrote:
> > > > > > >> > Denis,
> > > > > > >> >
> > > > > > >> > Thanks for the extensive analysis. There is a vast room for
> > > > > > optimizations
> > > > > > >> > on the service grid side.
> > > > > > >> >
> > > > > > >> > Yakov, Sam, Alex G.,
> > > > > > >> >
> > > > > > >> > How do you like the idea of the usage of discovery protocol
> > for
> > > > the
> > > > > > >> service
> > > > > > >> > grid system messages exchange? Any pitfalls?
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > --
> > > > > > >> > Denis
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > On Fri, Mar 23, 2018 at 8:01 AM, Denis Mekhanikov <
> > > > > > [hidden email]
> > > > > > >> >
> > > > > > >> > wrote:
> > > > > > >> >
> > > > > > >> >> Igniters,
> > > > > > >> >>
> > > > > > >> >> I'd like to start a discussion on Ignite service grid
> > redesign.
> > > > > > >> >> We have a number of problems in our current architecture,
> > that
> > > > have
> > > > > > to
> > > > > > >> be
> > > > > > >> >> addressed.
> > > > > > >> >>
> > > > > > >> >> Here are the most severe ones:
> > > > > > >> >>
> > > > > > >> >> One of them is lack of guarantee, that service is
> > successfully
> > > > > > deployed
> > > > > > >> and
> > > > > > >> >> ready for work by the time, when *IgniteService.deploy*()*
> > > > methods
> > > > > > >> return.
> > > > > > >> >> Furthermore, if an exception is thrown from *Service.init()
> > > > > *method,
> > > > > > >> then
> > > > > > >> >> the deploying side is not able to receive it, or even
> > > understand,
> > > > > > that
> > > > > > >> >> service is in unusable state.
> > > > > > >> >> So, you may end up in such situation, when you deployed a
> > > service
> > > > > > >> without
> > > > > > >> >> receiving any errors, then called a service's method, and
> > hung
> > > > > > >> indefinitely
> > > > > > >> >> on this invocation.
> > > > > > >> >> JIRA ticket:
> > https://issues.apache.org/jira/browse/IGNITE-3392
> > > > > > >> >>
> > > > > > >> >> Another problem is locking during service deployment on
> > > unstable
> > > > > > >> topology.
> > > > > > >> >> This issue is caused by missing updates in continuous query
> > > > > > listeners on
> > > > > > >> >> the internal cache.
> > > > > > >> >> It is hard to reproduce, but it happens sometimes. We
> > shouldn't
> > > > > allow
> > > > > > >> such
> > > > > > >> >> possibility, that deployment methods hang without saying
> > > > anything.
> > > > > > >> >> JIRA ticket:
> > https://issues.apache.org/jira/browse/IGNITE-6259
> > > > > > >> >>
> > > > > > >> >> I think, we should change the deployment procedure to make
> it
> > > > more
> > > > > > >> >> reliable.
> > > > > > >> >> Moving from operating over internal replicated service
> cache
> > to
> > > > > > sending
> > > > > > >> >> custom discovery events seems to be a good idea.
> > > > > > >> >> Service deployment may trigger a discovery event, that will
> > > make
> > > > > > chosen
> > > > > > >> >> nodes deploy the service, and the same event will notify
> > other
> > > > > nodes
> > > > > > >> about
> > > > > > >> >> the deployed service instances.
> > > > > > >> >> It will eliminate the need for distributed transactions on
> > the
> > > > > > internal
> > > > > > >> >> replicated system cache, and make the service deployment
> > > protocol
> > > > > > more
> > > > > > >> >> transparent.
> > > > > > >> >>
> > > > > > >> >> There are a few points, that should be taken into account
> > > though.
> > > > > > >> >>
> > > > > > >> >> First of all, we can't wait for services to be deployed and
> > > > > > initialised
> > > > > > >> in
> > > > > > >> >> the discovery thread.
> > > > > > >> >> So, we need to make notification about service deployment
> > > result
> > > > > > >> >> asynchronous, presumably over communication protocol.
> > > > > > >> >> I can think of a procedure similar to the current exchange
> > > > > protocol,
> > > > > > >> when
> > > > > > >> >> service deployment is initialised with an initial discovery
> > > > > message,
> > > > > > >> >> followed by asynchronous notifications from the hosting
> > servers
> > > > > over
> > > > > > >> >> communication. And finally, one more discovery message will
> > > > notify
> > > > > > all
> > > > > > >> >> nodes about the service deployment result and location of
> the
> > > > > > deployed
> > > > > > >> >> service instances. Coordinator will be responsible for
> > > collecting
> > > > > of
> > > > > > the
> > > > > > >> >> deployment results in this scheme.
> > > > > > >> >>
> > > > > > >> >> Another problem is failover in case, when some nodes fail
> > > during
> > > > > > >> deployment
> > > > > > >> >> or further work.
> > > > > > >> >> The following cases should be handled:
> > > > > > >> >>
> > > > > > >> >>    1. coordinator failure during deployment;
> > > > > > >> >>    2. failure of nodes, that were chosen to host the
> service,
> > > > > during
> > > > > > >> >>    deployment;
> > > > > > >> >>    3. failure of nodes, that contain deployed services,
> after
> > > the
> > > > > > >> >>    deployment.
> > > > > > >> >>
> > > > > > >> >> The first case may be resolved by either continuation of
> > > > deployment
> > > > > > >> with a
> > > > > > >> >> new coordinator, or by cancelling it.
> > > > > > >> >> The second case will require another node to be chosen and
> > > > > notified.
> > > > > > >> Maybe
> > > > > > >> >> another discovery message will be needed.
> > > > > > >> >> The third case will require redeployment, so coordinator
> > should
> > > > > track
> > > > > > >> >> topology changes and redeploy failed services.
> > > > > > >> >>
> > > > > > >> >> Another good improvement would be service versioning. This
> > > matter
> > > > > was
> > > > > > >> >> already discussed in another thread:
> > > > > > >> >>
> > > > > > >>
> > > > > > http://apache-ignite-developers.2346864.n4.nabble.
> > > > > com/Service-versioning-
> > > > > > >> >> td20858.html
> > > > > > >> >> Let's resume this discussion and state the final decision
> > here.
> > > > > > >> >> This feature is closely connected to peer class loading,
> > which
> > > is
> > > > > not
> > > > > > >> >> working for services currently.
> > > > > > >> >> So, service versioning should be implemented along with
> peer
> > > > class
> > > > > > >> loading.
> > > > > > >> >> JIRA ticket for versioning:
> > > > > > >> >> https://issues.apache.org/jira/browse/IGNITE-6069
> > > > > > >> >> Peer class loading: https://issues.apache.org/
> > > > > jira/browse/IGNITE-975
> > > > > > >> >>
> > > > > > >> >> Please share your thoughts. Constructive criticism is
> highly
> > > > > > >> appreciated.
> > > > > > >> >>
> > > > > > >> >> Denis
> > > > > > >> >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > >> Best Regards, Vyacheslav D.
> > > > > > >>
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best Regards, Vyacheslav D.
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Service grid redesign

Valentin Kulichenko
Denis,

This is why I'm suggesting to use DeploymentSpi for this. The way I see
this is that instead of deploying classes on local classpath, user can
deploy them in the storage that SPI points to. If class is updated in the
storage, Ignite detects this and automatically restarts the service. This
is a very simple and straightforward approach that doesn't required a lot
of changes on our side and allows to reuse existing implementation of
DeploymentSpi.

-Val

On Thu, Apr 5, 2018 at 12:13 PM, Denis Magda <[hidden email]> wrote:

> >
> > There is no need to deserialize services on the coordinator. It should
> only
> > be able to calculate the assignments.
> > *LazyServiceConfiguration *should be used to deliver the service
> > configurations, just like it is done right now.
>
>
> Can that configuration be tweaked over the time requiring to update the
> class on all the nodes (if, for instance, someone wants to deploy the next
> version of a service)? Just want to be sure we don't need to restart the
> cluster nodes (that won't be used for service deployments) on
> services-related configurational changes.
>
> --
> Denis
>
> On Thu, Apr 5, 2018 at 8:18 AM, Denis Mekhanikov <[hidden email]>
> wrote:
>
> > Denis,
> > There is no need to deserialize services on the coordinator. It should
> only
> > be able to calculate the assignments.
> > *LazyServiceConfiguration *should be used to deliver the service
> > configurations, just like it is done right now.
> >
> > Val,
> > Usage of DeploymentSpi is a good idea, I didn't think about this
> > possibility.
> > This is a viable alternative to peer-class-loading, not that
> user-friendly
> > though.
> > But if peer-class-loading is that hard to implement, then I vote for
> > DeploymentSpi.
> > As far as I understand, it won't require us to do any additional changes
> in
> > Ignite, but will make users think about using a proper DeploymentSpi.
> > Please correct me, if I'm wrong.
> > It would be good, though, to add some examples on service redeployment,
> > when implementation class changes.
> >
> > Denis
> >
> > чт, 5 апр. 2018 г. в 2:33, Valentin Kulichenko <
> > [hidden email]>:
> >
> > > I don't think peer class loading is even possible for services. I
> believe
> > > we should reuse DeploymentSpi [1] for versioning.
> > >
> > > [1] https://apacheignite.readme.io/docs/deployment-spi
> > >
> > > -Val
> > >
> > > On Wed, Apr 4, 2018 at 12:52 PM, Denis Magda <[hidden email]>
> > wrote:
> > >
> > > > Sorry, that was me who renamed the IEP to "Oil Change in Service
> Grid".
> > > Was
> > > > writing this email after the renaming. Like that title more because
> > it's
> > > > fun and highlights what we're intended to do - cleaning of our
> service
> > > grid
> > > > engine and powering it up with new "liquid" (new communication and
> > > > deployment approach not available before).
> > > >
> > > > Denis
> > > >
> > > >
> > > > > This message contains serialized service instance and its
> > > configuration.
> > > > > It is delivered to the coordinator node first, that calculates the
> > > > service
> > > > > deployment assignments and adds this information to the message.
> > > >
> > > >
> > > > I would consider using a NodeFilter first to decide where a service
> can
> > > be
> > > > potentially deployed.  Otherwise, we would require service classes to
> > be
> > > on
> > > > every node (every node might become a coordinator) which is not the
> > > desired
> > > > requirement.
> > > >
> > > >
> > > > As for the peer-class-loading, I would backup up Dmitriy here. Let's
> at
> > > > least not to focus on this task for now. We should design services
> > > > versioning in the right way first and support it.
> > > >
> > > > --
> > > > Denis
> > > >
> > > >
> > > >
> > > > On Wed, Apr 4, 2018 at 12:20 PM, Dmitriy Setrakyan <
> > > [hidden email]>
> > > > wrote:
> > > >
> > > > > Here is the correct link:
> > > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > > > > 17%3A+Oil+Change+in+Service+Grid
> > > > >
> > > > > I have looked at the tickets there, and I believe that we should
> not
> > > > > support peer-deployment for services. It is very hard and I do not
> > > think
> > > > we
> > > > > should even try.
> > > > >
> > > > > I am proposing closing this ticket as Won't Fix -
> > > > > https://issues.apache.org/jira/browse/IGNITE-975
> > > > >
> > > > > D.
> > > > >
> > > > > On Wed, Apr 4, 2018 at 5:39 AM, Denis Mekhanikov <
> > > [hidden email]>
> > > > > wrote:
> > > > >
> > > > > > Vyacheslav,
> > > > > >
> > > > > > I've just posted my first draft of the IEP:
> > > > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > > > > 17%3A+Service+grid+
> > > > > > improvements
> > > > > > It's not finished yet, but you can get the idea from it.
> > > > > > If you have some thoughts on your mind, please let me know, I'll
> > add
> > > > them
> > > > > > to the IEP.
> > > > > >
> > > > > > Denis
> > > > > >
> > > > > > ср, 4 апр. 2018 г. в 13:09, Vyacheslav Daradur <
> > [hidden email]
> > > >:
> > > > > >
> > > > > > > Denis, thanks for the link.
> > > > > > >
> > > > > > > I looked through the task and I think that understand your
> > redesign
> > > > > point
> > > > > > > now.
> > > > > > >
> > > > > > > Do you have a clear plan or IEP for the whole redesign?
> > > > > > >
> > > > > > > I'm interested in this component and I'd like to take part in
> the
> > > > > > > development.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Apr 2, 2018 at 2:55 PM, Denis Mekhanikov <
> > > > > [hidden email]>
> > > > > > > wrote:
> > > > > > > > Vyacheslav,
> > > > > > > >
> > > > > > > > Service deployment design, based on replicated utility cache
> > has
> > > > > proven
> > > > > > > to
> > > > > > > > be unstable and deadlock-prone.
> > > > > > > > You can find a list of JIRA issues, connected to it, in my
> > > previous
> > > > > > > letter.
> > > > > > > >
> > > > > > > > The intention behind it is similar to the binary metadata
> > > redesign,
> > > > > > that
> > > > > > > > happened in the following ticket: IGNITE-4157
> > > > > > > > <https://issues.apache.org/jira/browse/IGNITE-4157>
> > > > > > > > This change in service deployment procedure will eliminate
> need
> > > for
> > > > > > > another
> > > > > > > > internal replicated cache
> > > > > > > > and make service deployment more reliable on unstable
> topology.
> > > > > > > >
> > > > > > > > Denis
> > > > > > > >
> > > > > > > > вт, 27 мар. 2018 г. в 23:21, Vyacheslav Daradur <
> > > > [hidden email]
> > > > > >:
> > > > > > > >
> > > > > > > >> Hi, Denis Mekhanikov!
> > > > > > > >>
> > > > > > > >> As far as I know, Ignite services are based on IgniteCache
> and
> > > we
> > > > > have
> > > > > > > >> all its features. We can use listeners or continuous queries
> > for
> > > > > > > >> deployment synchronizations.
> > > > > > > >>
> > > > > > > >> Why do you want using the discovery layer for that?
> > > > > > > >>
> > > > > > > >> One more thing: we can use baseline approach for services,
> > that
> > > > > means
> > > > > > > >> *IgniteService.deploy()* returns ready to work service after
> > > > > > > >> deployment on baseline nodes and deploy to other nodes on
> > > demand,
> > > > > for
> > > > > > > >> example when deployed service's loading will be hight.
> > > > > > > >>
> > > > > > > >> About versioning, maybe there is sense to extend public API:
> > > > > > > >> IgniteServices.service(name, *version*)?
> > > > > > > >>
> > > > > > > >> At first deployment, we can compute service's hashcode (just
> > for
> > > > an
> > > > > > > >> example) and store it, after new deployment request for
> > services
> > > > > with
> > > > > > > >> an existing name we will compute new service's hashcode and
> > > > compare
> > > > > > > >> them if they have different hashcodes that we will deploy
> new
> > > > > service
> > > > > > > >> as service with a different version.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On Fri, Mar 23, 2018 at 10:03 PM, Denis Magda <
> > > [hidden email]>
> > > > > > > wrote:
> > > > > > > >> > Denis,
> > > > > > > >> >
> > > > > > > >> > Thanks for the extensive analysis. There is a vast room
> for
> > > > > > > optimizations
> > > > > > > >> > on the service grid side.
> > > > > > > >> >
> > > > > > > >> > Yakov, Sam, Alex G.,
> > > > > > > >> >
> > > > > > > >> > How do you like the idea of the usage of discovery
> protocol
> > > for
> > > > > the
> > > > > > > >> service
> > > > > > > >> > grid system messages exchange? Any pitfalls?
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > --
> > > > > > > >> > Denis
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > On Fri, Mar 23, 2018 at 8:01 AM, Denis Mekhanikov <
> > > > > > > [hidden email]
> > > > > > > >> >
> > > > > > > >> > wrote:
> > > > > > > >> >
> > > > > > > >> >> Igniters,
> > > > > > > >> >>
> > > > > > > >> >> I'd like to start a discussion on Ignite service grid
> > > redesign.
> > > > > > > >> >> We have a number of problems in our current architecture,
> > > that
> > > > > have
> > > > > > > to
> > > > > > > >> be
> > > > > > > >> >> addressed.
> > > > > > > >> >>
> > > > > > > >> >> Here are the most severe ones:
> > > > > > > >> >>
> > > > > > > >> >> One of them is lack of guarantee, that service is
> > > successfully
> > > > > > > deployed
> > > > > > > >> and
> > > > > > > >> >> ready for work by the time, when
> *IgniteService.deploy*()*
> > > > > methods
> > > > > > > >> return.
> > > > > > > >> >> Furthermore, if an exception is thrown from
> *Service.init()
> > > > > > *method,
> > > > > > > >> then
> > > > > > > >> >> the deploying side is not able to receive it, or even
> > > > understand,
> > > > > > > that
> > > > > > > >> >> service is in unusable state.
> > > > > > > >> >> So, you may end up in such situation, when you deployed a
> > > > service
> > > > > > > >> without
> > > > > > > >> >> receiving any errors, then called a service's method, and
> > > hung
> > > > > > > >> indefinitely
> > > > > > > >> >> on this invocation.
> > > > > > > >> >> JIRA ticket:
> > > https://issues.apache.org/jira/browse/IGNITE-3392
> > > > > > > >> >>
> > > > > > > >> >> Another problem is locking during service deployment on
> > > > unstable
> > > > > > > >> topology.
> > > > > > > >> >> This issue is caused by missing updates in continuous
> query
> > > > > > > listeners on
> > > > > > > >> >> the internal cache.
> > > > > > > >> >> It is hard to reproduce, but it happens sometimes. We
> > > shouldn't
> > > > > > allow
> > > > > > > >> such
> > > > > > > >> >> possibility, that deployment methods hang without saying
> > > > > anything.
> > > > > > > >> >> JIRA ticket:
> > > https://issues.apache.org/jira/browse/IGNITE-6259
> > > > > > > >> >>
> > > > > > > >> >> I think, we should change the deployment procedure to
> make
> > it
> > > > > more
> > > > > > > >> >> reliable.
> > > > > > > >> >> Moving from operating over internal replicated service
> > cache
> > > to
> > > > > > > sending
> > > > > > > >> >> custom discovery events seems to be a good idea.
> > > > > > > >> >> Service deployment may trigger a discovery event, that
> will
> > > > make
> > > > > > > chosen
> > > > > > > >> >> nodes deploy the service, and the same event will notify
> > > other
> > > > > > nodes
> > > > > > > >> about
> > > > > > > >> >> the deployed service instances.
> > > > > > > >> >> It will eliminate the need for distributed transactions
> on
> > > the
> > > > > > > internal
> > > > > > > >> >> replicated system cache, and make the service deployment
> > > > protocol
> > > > > > > more
> > > > > > > >> >> transparent.
> > > > > > > >> >>
> > > > > > > >> >> There are a few points, that should be taken into account
> > > > though.
> > > > > > > >> >>
> > > > > > > >> >> First of all, we can't wait for services to be deployed
> and
> > > > > > > initialised
> > > > > > > >> in
> > > > > > > >> >> the discovery thread.
> > > > > > > >> >> So, we need to make notification about service deployment
> > > > result
> > > > > > > >> >> asynchronous, presumably over communication protocol.
> > > > > > > >> >> I can think of a procedure similar to the current
> exchange
> > > > > > protocol,
> > > > > > > >> when
> > > > > > > >> >> service deployment is initialised with an initial
> discovery
> > > > > > message,
> > > > > > > >> >> followed by asynchronous notifications from the hosting
> > > servers
> > > > > > over
> > > > > > > >> >> communication. And finally, one more discovery message
> will
> > > > > notify
> > > > > > > all
> > > > > > > >> >> nodes about the service deployment result and location of
> > the
> > > > > > > deployed
> > > > > > > >> >> service instances. Coordinator will be responsible for
> > > > collecting
> > > > > > of
> > > > > > > the
> > > > > > > >> >> deployment results in this scheme.
> > > > > > > >> >>
> > > > > > > >> >> Another problem is failover in case, when some nodes fail
> > > > during
> > > > > > > >> deployment
> > > > > > > >> >> or further work.
> > > > > > > >> >> The following cases should be handled:
> > > > > > > >> >>
> > > > > > > >> >>    1. coordinator failure during deployment;
> > > > > > > >> >>    2. failure of nodes, that were chosen to host the
> > service,
> > > > > > during
> > > > > > > >> >>    deployment;
> > > > > > > >> >>    3. failure of nodes, that contain deployed services,
> > after
> > > > the
> > > > > > > >> >>    deployment.
> > > > > > > >> >>
> > > > > > > >> >> The first case may be resolved by either continuation of
> > > > > deployment
> > > > > > > >> with a
> > > > > > > >> >> new coordinator, or by cancelling it.
> > > > > > > >> >> The second case will require another node to be chosen
> and
> > > > > > notified.
> > > > > > > >> Maybe
> > > > > > > >> >> another discovery message will be needed.
> > > > > > > >> >> The third case will require redeployment, so coordinator
> > > should
> > > > > > track
> > > > > > > >> >> topology changes and redeploy failed services.
> > > > > > > >> >>
> > > > > > > >> >> Another good improvement would be service versioning.
> This
> > > > matter
> > > > > > was
> > > > > > > >> >> already discussed in another thread:
> > > > > > > >> >>
> > > > > > > >>
> > > > > > > http://apache-ignite-developers.2346864.n4.nabble.
> > > > > > com/Service-versioning-
> > > > > > > >> >> td20858.html
> > > > > > > >> >> Let's resume this discussion and state the final decision
> > > here.
> > > > > > > >> >> This feature is closely connected to peer class loading,
> > > which
> > > > is
> > > > > > not
> > > > > > > >> >> working for services currently.
> > > > > > > >> >> So, service versioning should be implemented along with
> > peer
> > > > > class
> > > > > > > >> loading.
> > > > > > > >> >> JIRA ticket for versioning:
> > > > > > > >> >> https://issues.apache.org/jira/browse/IGNITE-6069
> > > > > > > >> >> Peer class loading: https://issues.apache.org/
> > > > > > jira/browse/IGNITE-975
> > > > > > > >> >>
> > > > > > > >> >> Please share your thoughts. Constructive criticism is
> > highly
> > > > > > > >> appreciated.
> > > > > > > >> >>
> > > > > > > >> >> Denis
> > > > > > > >> >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >> Best Regards, Vyacheslav D.
> > > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best Regards, Vyacheslav D.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Service grid redesign

Denis Magda-2
Val,

Sounds like a great solution. I'm totally for it.

--
Denis

On Thu, Apr 5, 2018 at 12:32 PM, Valentin Kulichenko <
[hidden email]> wrote:

> Denis,
>
> This is why I'm suggesting to use DeploymentSpi for this. The way I see
> this is that instead of deploying classes on local classpath, user can
> deploy them in the storage that SPI points to. If class is updated in the
> storage, Ignite detects this and automatically restarts the service. This
> is a very simple and straightforward approach that doesn't required a lot
> of changes on our side and allows to reuse existing implementation of
> DeploymentSpi.
>
> -Val
>
> On Thu, Apr 5, 2018 at 12:13 PM, Denis Magda <[hidden email]> wrote:
>
> > >
> > > There is no need to deserialize services on the coordinator. It should
> > only
> > > be able to calculate the assignments.
> > > *LazyServiceConfiguration *should be used to deliver the service
> > > configurations, just like it is done right now.
> >
> >
> > Can that configuration be tweaked over the time requiring to update the
> > class on all the nodes (if, for instance, someone wants to deploy the
> next
> > version of a service)? Just want to be sure we don't need to restart the
> > cluster nodes (that won't be used for service deployments) on
> > services-related configurational changes.
> >
> > --
> > Denis
> >
> > On Thu, Apr 5, 2018 at 8:18 AM, Denis Mekhanikov <[hidden email]>
> > wrote:
> >
> > > Denis,
> > > There is no need to deserialize services on the coordinator. It should
> > only
> > > be able to calculate the assignments.
> > > *LazyServiceConfiguration *should be used to deliver the service
> > > configurations, just like it is done right now.
> > >
> > > Val,
> > > Usage of DeploymentSpi is a good idea, I didn't think about this
> > > possibility.
> > > This is a viable alternative to peer-class-loading, not that
> > user-friendly
> > > though.
> > > But if peer-class-loading is that hard to implement, then I vote for
> > > DeploymentSpi.
> > > As far as I understand, it won't require us to do any additional
> changes
> > in
> > > Ignite, but will make users think about using a proper DeploymentSpi.
> > > Please correct me, if I'm wrong.
> > > It would be good, though, to add some examples on service redeployment,
> > > when implementation class changes.
> > >
> > > Denis
> > >
> > > чт, 5 апр. 2018 г. в 2:33, Valentin Kulichenko <
> > > [hidden email]>:
> > >
> > > > I don't think peer class loading is even possible for services. I
> > believe
> > > > we should reuse DeploymentSpi [1] for versioning.
> > > >
> > > > [1] https://apacheignite.readme.io/docs/deployment-spi
> > > >
> > > > -Val
> > > >
> > > > On Wed, Apr 4, 2018 at 12:52 PM, Denis Magda <[hidden email]>
> > > wrote:
> > > >
> > > > > Sorry, that was me who renamed the IEP to "Oil Change in Service
> > Grid".
> > > > Was
> > > > > writing this email after the renaming. Like that title more because
> > > it's
> > > > > fun and highlights what we're intended to do - cleaning of our
> > service
> > > > grid
> > > > > engine and powering it up with new "liquid" (new communication and
> > > > > deployment approach not available before).
> > > > >
> > > > > Denis
> > > > >
> > > > >
> > > > > > This message contains serialized service instance and its
> > > > configuration.
> > > > > > It is delivered to the coordinator node first, that calculates
> the
> > > > > service
> > > > > > deployment assignments and adds this information to the message.
> > > > >
> > > > >
> > > > > I would consider using a NodeFilter first to decide where a service
> > can
> > > > be
> > > > > potentially deployed.  Otherwise, we would require service classes
> to
> > > be
> > > > on
> > > > > every node (every node might become a coordinator) which is not the
> > > > desired
> > > > > requirement.
> > > > >
> > > > >
> > > > > As for the peer-class-loading, I would backup up Dmitriy here.
> Let's
> > at
> > > > > least not to focus on this task for now. We should design services
> > > > > versioning in the right way first and support it.
> > > > >
> > > > > --
> > > > > Denis
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Apr 4, 2018 at 12:20 PM, Dmitriy Setrakyan <
> > > > [hidden email]>
> > > > > wrote:
> > > > >
> > > > > > Here is the correct link:
> > > > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > > > > > 17%3A+Oil+Change+in+Service+Grid
> > > > > >
> > > > > > I have looked at the tickets there, and I believe that we should
> > not
> > > > > > support peer-deployment for services. It is very hard and I do
> not
> > > > think
> > > > > we
> > > > > > should even try.
> > > > > >
> > > > > > I am proposing closing this ticket as Won't Fix -
> > > > > > https://issues.apache.org/jira/browse/IGNITE-975
> > > > > >
> > > > > > D.
> > > > > >
> > > > > > On Wed, Apr 4, 2018 at 5:39 AM, Denis Mekhanikov <
> > > > [hidden email]>
> > > > > > wrote:
> > > > > >
> > > > > > > Vyacheslav,
> > > > > > >
> > > > > > > I've just posted my first draft of the IEP:
> > > > > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > > > > > 17%3A+Service+grid+
> > > > > > > improvements
> > > > > > > It's not finished yet, but you can get the idea from it.
> > > > > > > If you have some thoughts on your mind, please let me know,
> I'll
> > > add
> > > > > them
> > > > > > > to the IEP.
> > > > > > >
> > > > > > > Denis
> > > > > > >
> > > > > > > ср, 4 апр. 2018 г. в 13:09, Vyacheslav Daradur <
> > > [hidden email]
> > > > >:
> > > > > > >
> > > > > > > > Denis, thanks for the link.
> > > > > > > >
> > > > > > > > I looked through the task and I think that understand your
> > > redesign
> > > > > > point
> > > > > > > > now.
> > > > > > > >
> > > > > > > > Do you have a clear plan or IEP for the whole redesign?
> > > > > > > >
> > > > > > > > I'm interested in this component and I'd like to take part in
> > the
> > > > > > > > development.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Apr 2, 2018 at 2:55 PM, Denis Mekhanikov <
> > > > > > [hidden email]>
> > > > > > > > wrote:
> > > > > > > > > Vyacheslav,
> > > > > > > > >
> > > > > > > > > Service deployment design, based on replicated utility
> cache
> > > has
> > > > > > proven
> > > > > > > > to
> > > > > > > > > be unstable and deadlock-prone.
> > > > > > > > > You can find a list of JIRA issues, connected to it, in my
> > > > previous
> > > > > > > > letter.
> > > > > > > > >
> > > > > > > > > The intention behind it is similar to the binary metadata
> > > > redesign,
> > > > > > > that
> > > > > > > > > happened in the following ticket: IGNITE-4157
> > > > > > > > > <https://issues.apache.org/jira/browse/IGNITE-4157>
> > > > > > > > > This change in service deployment procedure will eliminate
> > need
> > > > for
> > > > > > > > another
> > > > > > > > > internal replicated cache
> > > > > > > > > and make service deployment more reliable on unstable
> > topology.
> > > > > > > > >
> > > > > > > > > Denis
> > > > > > > > >
> > > > > > > > > вт, 27 мар. 2018 г. в 23:21, Vyacheslav Daradur <
> > > > > [hidden email]
> > > > > > >:
> > > > > > > > >
> > > > > > > > >> Hi, Denis Mekhanikov!
> > > > > > > > >>
> > > > > > > > >> As far as I know, Ignite services are based on IgniteCache
> > and
> > > > we
> > > > > > have
> > > > > > > > >> all its features. We can use listeners or continuous
> queries
> > > for
> > > > > > > > >> deployment synchronizations.
> > > > > > > > >>
> > > > > > > > >> Why do you want using the discovery layer for that?
> > > > > > > > >>
> > > > > > > > >> One more thing: we can use baseline approach for services,
> > > that
> > > > > > means
> > > > > > > > >> *IgniteService.deploy()* returns ready to work service
> after
> > > > > > > > >> deployment on baseline nodes and deploy to other nodes on
> > > > demand,
> > > > > > for
> > > > > > > > >> example when deployed service's loading will be hight.
> > > > > > > > >>
> > > > > > > > >> About versioning, maybe there is sense to extend public
> API:
> > > > > > > > >> IgniteServices.service(name, *version*)?
> > > > > > > > >>
> > > > > > > > >> At first deployment, we can compute service's hashcode
> (just
> > > for
> > > > > an
> > > > > > > > >> example) and store it, after new deployment request for
> > > services
> > > > > > with
> > > > > > > > >> an existing name we will compute new service's hashcode
> and
> > > > > compare
> > > > > > > > >> them if they have different hashcodes that we will deploy
> > new
> > > > > > service
> > > > > > > > >> as service with a different version.
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> On Fri, Mar 23, 2018 at 10:03 PM, Denis Magda <
> > > > [hidden email]>
> > > > > > > > wrote:
> > > > > > > > >> > Denis,
> > > > > > > > >> >
> > > > > > > > >> > Thanks for the extensive analysis. There is a vast room
> > for
> > > > > > > > optimizations
> > > > > > > > >> > on the service grid side.
> > > > > > > > >> >
> > > > > > > > >> > Yakov, Sam, Alex G.,
> > > > > > > > >> >
> > > > > > > > >> > How do you like the idea of the usage of discovery
> > protocol
> > > > for
> > > > > > the
> > > > > > > > >> service
> > > > > > > > >> > grid system messages exchange? Any pitfalls?
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> > --
> > > > > > > > >> > Denis
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> > On Fri, Mar 23, 2018 at 8:01 AM, Denis Mekhanikov <
> > > > > > > > [hidden email]
> > > > > > > > >> >
> > > > > > > > >> > wrote:
> > > > > > > > >> >
> > > > > > > > >> >> Igniters,
> > > > > > > > >> >>
> > > > > > > > >> >> I'd like to start a discussion on Ignite service grid
> > > > redesign.
> > > > > > > > >> >> We have a number of problems in our current
> architecture,
> > > > that
> > > > > > have
> > > > > > > > to
> > > > > > > > >> be
> > > > > > > > >> >> addressed.
> > > > > > > > >> >>
> > > > > > > > >> >> Here are the most severe ones:
> > > > > > > > >> >>
> > > > > > > > >> >> One of them is lack of guarantee, that service is
> > > > successfully
> > > > > > > > deployed
> > > > > > > > >> and
> > > > > > > > >> >> ready for work by the time, when
> > *IgniteService.deploy*()*
> > > > > > methods
> > > > > > > > >> return.
> > > > > > > > >> >> Furthermore, if an exception is thrown from
> > *Service.init()
> > > > > > > *method,
> > > > > > > > >> then
> > > > > > > > >> >> the deploying side is not able to receive it, or even
> > > > > understand,
> > > > > > > > that
> > > > > > > > >> >> service is in unusable state.
> > > > > > > > >> >> So, you may end up in such situation, when you
> deployed a
> > > > > service
> > > > > > > > >> without
> > > > > > > > >> >> receiving any errors, then called a service's method,
> and
> > > > hung
> > > > > > > > >> indefinitely
> > > > > > > > >> >> on this invocation.
> > > > > > > > >> >> JIRA ticket:
> > > > https://issues.apache.org/jira/browse/IGNITE-3392
> > > > > > > > >> >>
> > > > > > > > >> >> Another problem is locking during service deployment on
> > > > > unstable
> > > > > > > > >> topology.
> > > > > > > > >> >> This issue is caused by missing updates in continuous
> > query
> > > > > > > > listeners on
> > > > > > > > >> >> the internal cache.
> > > > > > > > >> >> It is hard to reproduce, but it happens sometimes. We
> > > > shouldn't
> > > > > > > allow
> > > > > > > > >> such
> > > > > > > > >> >> possibility, that deployment methods hang without
> saying
> > > > > > anything.
> > > > > > > > >> >> JIRA ticket:
> > > > https://issues.apache.org/jira/browse/IGNITE-6259
> > > > > > > > >> >>
> > > > > > > > >> >> I think, we should change the deployment procedure to
> > make
> > > it
> > > > > > more
> > > > > > > > >> >> reliable.
> > > > > > > > >> >> Moving from operating over internal replicated service
> > > cache
> > > > to
> > > > > > > > sending
> > > > > > > > >> >> custom discovery events seems to be a good idea.
> > > > > > > > >> >> Service deployment may trigger a discovery event, that
> > will
> > > > > make
> > > > > > > > chosen
> > > > > > > > >> >> nodes deploy the service, and the same event will
> notify
> > > > other
> > > > > > > nodes
> > > > > > > > >> about
> > > > > > > > >> >> the deployed service instances.
> > > > > > > > >> >> It will eliminate the need for distributed transactions
> > on
> > > > the
> > > > > > > > internal
> > > > > > > > >> >> replicated system cache, and make the service
> deployment
> > > > > protocol
> > > > > > > > more
> > > > > > > > >> >> transparent.
> > > > > > > > >> >>
> > > > > > > > >> >> There are a few points, that should be taken into
> account
> > > > > though.
> > > > > > > > >> >>
> > > > > > > > >> >> First of all, we can't wait for services to be deployed
> > and
> > > > > > > > initialised
> > > > > > > > >> in
> > > > > > > > >> >> the discovery thread.
> > > > > > > > >> >> So, we need to make notification about service
> deployment
> > > > > result
> > > > > > > > >> >> asynchronous, presumably over communication protocol.
> > > > > > > > >> >> I can think of a procedure similar to the current
> > exchange
> > > > > > > protocol,
> > > > > > > > >> when
> > > > > > > > >> >> service deployment is initialised with an initial
> > discovery
> > > > > > > message,
> > > > > > > > >> >> followed by asynchronous notifications from the hosting
> > > > servers
> > > > > > > over
> > > > > > > > >> >> communication. And finally, one more discovery message
> > will
> > > > > > notify
> > > > > > > > all
> > > > > > > > >> >> nodes about the service deployment result and location
> of
> > > the
> > > > > > > > deployed
> > > > > > > > >> >> service instances. Coordinator will be responsible for
> > > > > collecting
> > > > > > > of
> > > > > > > > the
> > > > > > > > >> >> deployment results in this scheme.
> > > > > > > > >> >>
> > > > > > > > >> >> Another problem is failover in case, when some nodes
> fail
> > > > > during
> > > > > > > > >> deployment
> > > > > > > > >> >> or further work.
> > > > > > > > >> >> The following cases should be handled:
> > > > > > > > >> >>
> > > > > > > > >> >>    1. coordinator failure during deployment;
> > > > > > > > >> >>    2. failure of nodes, that were chosen to host the
> > > service,
> > > > > > > during
> > > > > > > > >> >>    deployment;
> > > > > > > > >> >>    3. failure of nodes, that contain deployed services,
> > > after
> > > > > the
> > > > > > > > >> >>    deployment.
> > > > > > > > >> >>
> > > > > > > > >> >> The first case may be resolved by either continuation
> of
> > > > > > deployment
> > > > > > > > >> with a
> > > > > > > > >> >> new coordinator, or by cancelling it.
> > > > > > > > >> >> The second case will require another node to be chosen
> > and
> > > > > > > notified.
> > > > > > > > >> Maybe
> > > > > > > > >> >> another discovery message will be needed.
> > > > > > > > >> >> The third case will require redeployment, so
> coordinator
> > > > should
> > > > > > > track
> > > > > > > > >> >> topology changes and redeploy failed services.
> > > > > > > > >> >>
> > > > > > > > >> >> Another good improvement would be service versioning.
> > This
> > > > > matter
> > > > > > > was
> > > > > > > > >> >> already discussed in another thread:
> > > > > > > > >> >>
> > > > > > > > >>
> > > > > > > > http://apache-ignite-developers.2346864.n4.nabble.
> > > > > > > com/Service-versioning-
> > > > > > > > >> >> td20858.html
> > > > > > > > >> >> Let's resume this discussion and state the final
> decision
> > > > here.
> > > > > > > > >> >> This feature is closely connected to peer class
> loading,
> > > > which
> > > > > is
> > > > > > > not
> > > > > > > > >> >> working for services currently.
> > > > > > > > >> >> So, service versioning should be implemented along with
> > > peer
> > > > > > class
> > > > > > > > >> loading.
> > > > > > > > >> >> JIRA ticket for versioning:
> > > > > > > > >> >> https://issues.apache.org/jira/browse/IGNITE-6069
> > > > > > > > >> >> Peer class loading: https://issues.apache.org/
> > > > > > > jira/browse/IGNITE-975
> > > > > > > > >> >>
> > > > > > > > >> >> Please share your thoughts. Constructive criticism is
> > > highly
> > > > > > > > >> appreciated.
> > > > > > > > >> >>
> > > > > > > > >> >> Denis
> > > > > > > > >> >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> --
> > > > > > > > >> Best Regards, Vyacheslav D.
> > > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best Regards, Vyacheslav D.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Service grid redesign

Denis Mekhanikov
Val,

I don't really like the idea of automatic redeployment of services when
classes change.
Different nodes may detect these changes at different moments in time, so
there won't be any guarantee, that all nodes have the same version.
And if redeployment fails, then there won't be a way to notify user code
about it.
Also service fields may change between versions, so already deployed
services won't be able to be deserialized, using new classes.

I think, it would be better if user could trigger redeployment manually. It
would solve the mentioned problems and let the user redeploy services, even
when only their field parameters change without implementation changes.

What do you think?

Denis

чт, 5 апр. 2018 г. в 22:37, Denis Magda <[hidden email]>:

> Val,
>
> Sounds like a great solution. I'm totally for it.
>
> --
> Denis
>
> On Thu, Apr 5, 2018 at 12:32 PM, Valentin Kulichenko <
> [hidden email]> wrote:
>
> > Denis,
> >
> > This is why I'm suggesting to use DeploymentSpi for this. The way I see
> > this is that instead of deploying classes on local classpath, user can
> > deploy them in the storage that SPI points to. If class is updated in the
> > storage, Ignite detects this and automatically restarts the service. This
> > is a very simple and straightforward approach that doesn't required a lot
> > of changes on our side and allows to reuse existing implementation of
> > DeploymentSpi.
> >
> > -Val
> >
> > On Thu, Apr 5, 2018 at 12:13 PM, Denis Magda <[hidden email]>
> wrote:
> >
> > > >
> > > > There is no need to deserialize services on the coordinator. It
> should
> > > only
> > > > be able to calculate the assignments.
> > > > *LazyServiceConfiguration *should be used to deliver the service
> > > > configurations, just like it is done right now.
> > >
> > >
> > > Can that configuration be tweaked over the time requiring to update the
> > > class on all the nodes (if, for instance, someone wants to deploy the
> > next
> > > version of a service)? Just want to be sure we don't need to restart
> the
> > > cluster nodes (that won't be used for service deployments) on
> > > services-related configurational changes.
> > >
> > > --
> > > Denis
> > >
> > > On Thu, Apr 5, 2018 at 8:18 AM, Denis Mekhanikov <
> [hidden email]>
> > > wrote:
> > >
> > > > Denis,
> > > > There is no need to deserialize services on the coordinator. It
> should
> > > only
> > > > be able to calculate the assignments.
> > > > *LazyServiceConfiguration *should be used to deliver the service
> > > > configurations, just like it is done right now.
> > > >
> > > > Val,
> > > > Usage of DeploymentSpi is a good idea, I didn't think about this
> > > > possibility.
> > > > This is a viable alternative to peer-class-loading, not that
> > > user-friendly
> > > > though.
> > > > But if peer-class-loading is that hard to implement, then I vote for
> > > > DeploymentSpi.
> > > > As far as I understand, it won't require us to do any additional
> > changes
> > > in
> > > > Ignite, but will make users think about using a proper DeploymentSpi.
> > > > Please correct me, if I'm wrong.
> > > > It would be good, though, to add some examples on service
> redeployment,
> > > > when implementation class changes.
> > > >
> > > > Denis
> > > >
> > > > чт, 5 апр. 2018 г. в 2:33, Valentin Kulichenko <
> > > > [hidden email]>:
> > > >
> > > > > I don't think peer class loading is even possible for services. I
> > > believe
> > > > > we should reuse DeploymentSpi [1] for versioning.
> > > > >
> > > > > [1] https://apacheignite.readme.io/docs/deployment-spi
> > > > >
> > > > > -Val
> > > > >
> > > > > On Wed, Apr 4, 2018 at 12:52 PM, Denis Magda <[hidden email]>
> > > > wrote:
> > > > >
> > > > > > Sorry, that was me who renamed the IEP to "Oil Change in Service
> > > Grid".
> > > > > Was
> > > > > > writing this email after the renaming. Like that title more
> because
> > > > it's
> > > > > > fun and highlights what we're intended to do - cleaning of our
> > > service
> > > > > grid
> > > > > > engine and powering it up with new "liquid" (new communication
> and
> > > > > > deployment approach not available before).
> > > > > >
> > > > > > Denis
> > > > > >
> > > > > >
> > > > > > > This message contains serialized service instance and its
> > > > > configuration.
> > > > > > > It is delivered to the coordinator node first, that calculates
> > the
> > > > > > service
> > > > > > > deployment assignments and adds this information to the
> message.
> > > > > >
> > > > > >
> > > > > > I would consider using a NodeFilter first to decide where a
> service
> > > can
> > > > > be
> > > > > > potentially deployed.  Otherwise, we would require service
> classes
> > to
> > > > be
> > > > > on
> > > > > > every node (every node might become a coordinator) which is not
> the
> > > > > desired
> > > > > > requirement.
> > > > > >
> > > > > >
> > > > > > As for the peer-class-loading, I would backup up Dmitriy here.
> > Let's
> > > at
> > > > > > least not to focus on this task for now. We should design
> services
> > > > > > versioning in the right way first and support it.
> > > > > >
> > > > > > --
> > > > > > Denis
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Apr 4, 2018 at 12:20 PM, Dmitriy Setrakyan <
> > > > > [hidden email]>
> > > > > > wrote:
> > > > > >
> > > > > > > Here is the correct link:
> > > > > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > > > > > > 17%3A+Oil+Change+in+Service+Grid
> > > > > > >
> > > > > > > I have looked at the tickets there, and I believe that we
> should
> > > not
> > > > > > > support peer-deployment for services. It is very hard and I do
> > not
> > > > > think
> > > > > > we
> > > > > > > should even try.
> > > > > > >
> > > > > > > I am proposing closing this ticket as Won't Fix -
> > > > > > > https://issues.apache.org/jira/browse/IGNITE-975
> > > > > > >
> > > > > > > D.
> > > > > > >
> > > > > > > On Wed, Apr 4, 2018 at 5:39 AM, Denis Mekhanikov <
> > > > > [hidden email]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Vyacheslav,
> > > > > > > >
> > > > > > > > I've just posted my first draft of the IEP:
> > > > > > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > > > > > > 17%3A+Service+grid+
> > > > > > > > improvements
> > > > > > > > It's not finished yet, but you can get the idea from it.
> > > > > > > > If you have some thoughts on your mind, please let me know,
> > I'll
> > > > add
> > > > > > them
> > > > > > > > to the IEP.
> > > > > > > >
> > > > > > > > Denis
> > > > > > > >
> > > > > > > > ср, 4 апр. 2018 г. в 13:09, Vyacheslav Daradur <
> > > > [hidden email]
> > > > > >:
> > > > > > > >
> > > > > > > > > Denis, thanks for the link.
> > > > > > > > >
> > > > > > > > > I looked through the task and I think that understand your
> > > > redesign
> > > > > > > point
> > > > > > > > > now.
> > > > > > > > >
> > > > > > > > > Do you have a clear plan or IEP for the whole redesign?
> > > > > > > > >
> > > > > > > > > I'm interested in this component and I'd like to take part
> in
> > > the
> > > > > > > > > development.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Apr 2, 2018 at 2:55 PM, Denis Mekhanikov <
> > > > > > > [hidden email]>
> > > > > > > > > wrote:
> > > > > > > > > > Vyacheslav,
> > > > > > > > > >
> > > > > > > > > > Service deployment design, based on replicated utility
> > cache
> > > > has
> > > > > > > proven
> > > > > > > > > to
> > > > > > > > > > be unstable and deadlock-prone.
> > > > > > > > > > You can find a list of JIRA issues, connected to it, in
> my
> > > > > previous
> > > > > > > > > letter.
> > > > > > > > > >
> > > > > > > > > > The intention behind it is similar to the binary metadata
> > > > > redesign,
> > > > > > > > that
> > > > > > > > > > happened in the following ticket: IGNITE-4157
> > > > > > > > > > <https://issues.apache.org/jira/browse/IGNITE-4157>
> > > > > > > > > > This change in service deployment procedure will
> eliminate
> > > need
> > > > > for
> > > > > > > > > another
> > > > > > > > > > internal replicated cache
> > > > > > > > > > and make service deployment more reliable on unstable
> > > topology.
> > > > > > > > > >
> > > > > > > > > > Denis
> > > > > > > > > >
> > > > > > > > > > вт, 27 мар. 2018 г. в 23:21, Vyacheslav Daradur <
> > > > > > [hidden email]
> > > > > > > >:
> > > > > > > > > >
> > > > > > > > > >> Hi, Denis Mekhanikov!
> > > > > > > > > >>
> > > > > > > > > >> As far as I know, Ignite services are based on
> IgniteCache
> > > and
> > > > > we
> > > > > > > have
> > > > > > > > > >> all its features. We can use listeners or continuous
> > queries
> > > > for
> > > > > > > > > >> deployment synchronizations.
> > > > > > > > > >>
> > > > > > > > > >> Why do you want using the discovery layer for that?
> > > > > > > > > >>
> > > > > > > > > >> One more thing: we can use baseline approach for
> services,
> > > > that
> > > > > > > means
> > > > > > > > > >> *IgniteService.deploy()* returns ready to work service
> > after
> > > > > > > > > >> deployment on baseline nodes and deploy to other nodes
> on
> > > > > demand,
> > > > > > > for
> > > > > > > > > >> example when deployed service's loading will be hight.
> > > > > > > > > >>
> > > > > > > > > >> About versioning, maybe there is sense to extend public
> > API:
> > > > > > > > > >> IgniteServices.service(name, *version*)?
> > > > > > > > > >>
> > > > > > > > > >> At first deployment, we can compute service's hashcode
> > (just
> > > > for
> > > > > > an
> > > > > > > > > >> example) and store it, after new deployment request for
> > > > services
> > > > > > > with
> > > > > > > > > >> an existing name we will compute new service's hashcode
> > and
> > > > > > compare
> > > > > > > > > >> them if they have different hashcodes that we will
> deploy
> > > new
> > > > > > > service
> > > > > > > > > >> as service with a different version.
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> On Fri, Mar 23, 2018 at 10:03 PM, Denis Magda <
> > > > > [hidden email]>
> > > > > > > > > wrote:
> > > > > > > > > >> > Denis,
> > > > > > > > > >> >
> > > > > > > > > >> > Thanks for the extensive analysis. There is a vast
> room
> > > for
> > > > > > > > > optimizations
> > > > > > > > > >> > on the service grid side.
> > > > > > > > > >> >
> > > > > > > > > >> > Yakov, Sam, Alex G.,
> > > > > > > > > >> >
> > > > > > > > > >> > How do you like the idea of the usage of discovery
> > > protocol
> > > > > for
> > > > > > > the
> > > > > > > > > >> service
> > > > > > > > > >> > grid system messages exchange? Any pitfalls?
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> > --
> > > > > > > > > >> > Denis
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> > On Fri, Mar 23, 2018 at 8:01 AM, Denis Mekhanikov <
> > > > > > > > > [hidden email]
> > > > > > > > > >> >
> > > > > > > > > >> > wrote:
> > > > > > > > > >> >
> > > > > > > > > >> >> Igniters,
> > > > > > > > > >> >>
> > > > > > > > > >> >> I'd like to start a discussion on Ignite service grid
> > > > > redesign.
> > > > > > > > > >> >> We have a number of problems in our current
> > architecture,
> > > > > that
> > > > > > > have
> > > > > > > > > to
> > > > > > > > > >> be
> > > > > > > > > >> >> addressed.
> > > > > > > > > >> >>
> > > > > > > > > >> >> Here are the most severe ones:
> > > > > > > > > >> >>
> > > > > > > > > >> >> One of them is lack of guarantee, that service is
> > > > > successfully
> > > > > > > > > deployed
> > > > > > > > > >> and
> > > > > > > > > >> >> ready for work by the time, when
> > > *IgniteService.deploy*()*
> > > > > > > methods
> > > > > > > > > >> return.
> > > > > > > > > >> >> Furthermore, if an exception is thrown from
> > > *Service.init()
> > > > > > > > *method,
> > > > > > > > > >> then
> > > > > > > > > >> >> the deploying side is not able to receive it, or even
> > > > > > understand,
> > > > > > > > > that
> > > > > > > > > >> >> service is in unusable state.
> > > > > > > > > >> >> So, you may end up in such situation, when you
> > deployed a
> > > > > > service
> > > > > > > > > >> without
> > > > > > > > > >> >> receiving any errors, then called a service's method,
> > and
> > > > > hung
> > > > > > > > > >> indefinitely
> > > > > > > > > >> >> on this invocation.
> > > > > > > > > >> >> JIRA ticket:
> > > > > https://issues.apache.org/jira/browse/IGNITE-3392
> > > > > > > > > >> >>
> > > > > > > > > >> >> Another problem is locking during service deployment
> on
> > > > > > unstable
> > > > > > > > > >> topology.
> > > > > > > > > >> >> This issue is caused by missing updates in continuous
> > > query
> > > > > > > > > listeners on
> > > > > > > > > >> >> the internal cache.
> > > > > > > > > >> >> It is hard to reproduce, but it happens sometimes. We
> > > > > shouldn't
> > > > > > > > allow
> > > > > > > > > >> such
> > > > > > > > > >> >> possibility, that deployment methods hang without
> > saying
> > > > > > > anything.
> > > > > > > > > >> >> JIRA ticket:
> > > > > https://issues.apache.org/jira/browse/IGNITE-6259
> > > > > > > > > >> >>
> > > > > > > > > >> >> I think, we should change the deployment procedure to
> > > make
> > > > it
> > > > > > > more
> > > > > > > > > >> >> reliable.
> > > > > > > > > >> >> Moving from operating over internal replicated
> service
> > > > cache
> > > > > to
> > > > > > > > > sending
> > > > > > > > > >> >> custom discovery events seems to be a good idea.
> > > > > > > > > >> >> Service deployment may trigger a discovery event,
> that
> > > will
> > > > > > make
> > > > > > > > > chosen
> > > > > > > > > >> >> nodes deploy the service, and the same event will
> > notify
> > > > > other
> > > > > > > > nodes
> > > > > > > > > >> about
> > > > > > > > > >> >> the deployed service instances.
> > > > > > > > > >> >> It will eliminate the need for distributed
> transactions
> > > on
> > > > > the
> > > > > > > > > internal
> > > > > > > > > >> >> replicated system cache, and make the service
> > deployment
> > > > > > protocol
> > > > > > > > > more
> > > > > > > > > >> >> transparent.
> > > > > > > > > >> >>
> > > > > > > > > >> >> There are a few points, that should be taken into
> > account
> > > > > > though.
> > > > > > > > > >> >>
> > > > > > > > > >> >> First of all, we can't wait for services to be
> deployed
> > > and
> > > > > > > > > initialised
> > > > > > > > > >> in
> > > > > > > > > >> >> the discovery thread.
> > > > > > > > > >> >> So, we need to make notification about service
> > deployment
> > > > > > result
> > > > > > > > > >> >> asynchronous, presumably over communication protocol.
> > > > > > > > > >> >> I can think of a procedure similar to the current
> > > exchange
> > > > > > > > protocol,
> > > > > > > > > >> when
> > > > > > > > > >> >> service deployment is initialised with an initial
> > > discovery
> > > > > > > > message,
> > > > > > > > > >> >> followed by asynchronous notifications from the
> hosting
> > > > > servers
> > > > > > > > over
> > > > > > > > > >> >> communication. And finally, one more discovery
> message
> > > will
> > > > > > > notify
> > > > > > > > > all
> > > > > > > > > >> >> nodes about the service deployment result and
> location
> > of
> > > > the
> > > > > > > > > deployed
> > > > > > > > > >> >> service instances. Coordinator will be responsible
> for
> > > > > > collecting
> > > > > > > > of
> > > > > > > > > the
> > > > > > > > > >> >> deployment results in this scheme.
> > > > > > > > > >> >>
> > > > > > > > > >> >> Another problem is failover in case, when some nodes
> > fail
> > > > > > during
> > > > > > > > > >> deployment
> > > > > > > > > >> >> or further work.
> > > > > > > > > >> >> The following cases should be handled:
> > > > > > > > > >> >>
> > > > > > > > > >> >>    1. coordinator failure during deployment;
> > > > > > > > > >> >>    2. failure of nodes, that were chosen to host the
> > > > service,
> > > > > > > > during
> > > > > > > > > >> >>    deployment;
> > > > > > > > > >> >>    3. failure of nodes, that contain deployed
> services,
> > > > after
> > > > > > the
> > > > > > > > > >> >>    deployment.
> > > > > > > > > >> >>
> > > > > > > > > >> >> The first case may be resolved by either continuation
> > of
> > > > > > > deployment
> > > > > > > > > >> with a
> > > > > > > > > >> >> new coordinator, or by cancelling it.
> > > > > > > > > >> >> The second case will require another node to be
> chosen
> > > and
> > > > > > > > notified.
> > > > > > > > > >> Maybe
> > > > > > > > > >> >> another discovery message will be needed.
> > > > > > > > > >> >> The third case will require redeployment, so
> > coordinator
> > > > > should
> > > > > > > > track
> > > > > > > > > >> >> topology changes and redeploy failed services.
> > > > > > > > > >> >>
> > > > > > > > > >> >> Another good improvement would be service versioning.
> > > This
> > > > > > matter
> > > > > > > > was
> > > > > > > > > >> >> already discussed in another thread:
> > > > > > > > > >> >>
> > > > > > > > > >>
> > > > > > > > > http://apache-ignite-developers.2346864.n4.nabble.
> > > > > > > > com/Service-versioning-
> > > > > > > > > >> >> td20858.html
> > > > > > > > > >> >> Let's resume this discussion and state the final
> > decision
> > > > > here.
> > > > > > > > > >> >> This feature is closely connected to peer class
> > loading,
> > > > > which
> > > > > > is
> > > > > > > > not
> > > > > > > > > >> >> working for services currently.
> > > > > > > > > >> >> So, service versioning should be implemented along
> with
> > > > peer
> > > > > > > class
> > > > > > > > > >> loading.
> > > > > > > > > >> >> JIRA ticket for versioning:
> > > > > > > > > >> >> https://issues.apache.org/jira/browse/IGNITE-6069
> > > > > > > > > >> >> Peer class loading: https://issues.apache.org/
> > > > > > > > jira/browse/IGNITE-975
> > > > > > > > > >> >>
> > > > > > > > > >> >> Please share your thoughts. Constructive criticism is
> > > > highly
> > > > > > > > > >> appreciated.
> > > > > > > > > >> >>
> > > > > > > > > >> >> Denis
> > > > > > > > > >> >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> --
> > > > > > > > > >> Best Regards, Vyacheslav D.
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best Regards, Vyacheslav D.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Service grid redesign

Dmitriy Pavlov
Hi Igniters,

I like automatic redeploy which can be disabled by config if user wants to
control this process. What do you think?

Sincerely,
Dmitriy Pavlov

пт, 6 апр. 2018 г. в 18:29, Denis Mekhanikov <[hidden email]>:

> Val,
>
> I don't really like the idea of automatic redeployment of services when
> classes change.
> Different nodes may detect these changes at different moments in time, so
> there won't be any guarantee, that all nodes have the same version.
> And if redeployment fails, then there won't be a way to notify user code
> about it.
> Also service fields may change between versions, so already deployed
> services won't be able to be deserialized, using new classes.
>
> I think, it would be better if user could trigger redeployment manually. It
> would solve the mentioned problems and let the user redeploy services, even
> when only their field parameters change without implementation changes.
>
> What do you think?
>
> Denis
>
> чт, 5 апр. 2018 г. в 22:37, Denis Magda <[hidden email]>:
>
> > Val,
> >
> > Sounds like a great solution. I'm totally for it.
> >
> > --
> > Denis
> >
> > On Thu, Apr 5, 2018 at 12:32 PM, Valentin Kulichenko <
> > [hidden email]> wrote:
> >
> > > Denis,
> > >
> > > This is why I'm suggesting to use DeploymentSpi for this. The way I see
> > > this is that instead of deploying classes on local classpath, user can
> > > deploy them in the storage that SPI points to. If class is updated in
> the
> > > storage, Ignite detects this and automatically restarts the service.
> This
> > > is a very simple and straightforward approach that doesn't required a
> lot
> > > of changes on our side and allows to reuse existing implementation of
> > > DeploymentSpi.
> > >
> > > -Val
> > >
> > > On Thu, Apr 5, 2018 at 12:13 PM, Denis Magda <[hidden email]>
> > wrote:
> > >
> > > > >
> > > > > There is no need to deserialize services on the coordinator. It
> > should
> > > > only
> > > > > be able to calculate the assignments.
> > > > > *LazyServiceConfiguration *should be used to deliver the service
> > > > > configurations, just like it is done right now.
> > > >
> > > >
> > > > Can that configuration be tweaked over the time requiring to update
> the
> > > > class on all the nodes (if, for instance, someone wants to deploy the
> > > next
> > > > version of a service)? Just want to be sure we don't need to restart
> > the
> > > > cluster nodes (that won't be used for service deployments) on
> > > > services-related configurational changes.
> > > >
> > > > --
> > > > Denis
> > > >
> > > > On Thu, Apr 5, 2018 at 8:18 AM, Denis Mekhanikov <
> > [hidden email]>
> > > > wrote:
> > > >
> > > > > Denis,
> > > > > There is no need to deserialize services on the coordinator. It
> > should
> > > > only
> > > > > be able to calculate the assignments.
> > > > > *LazyServiceConfiguration *should be used to deliver the service
> > > > > configurations, just like it is done right now.
> > > > >
> > > > > Val,
> > > > > Usage of DeploymentSpi is a good idea, I didn't think about this
> > > > > possibility.
> > > > > This is a viable alternative to peer-class-loading, not that
> > > > user-friendly
> > > > > though.
> > > > > But if peer-class-loading is that hard to implement, then I vote
> for
> > > > > DeploymentSpi.
> > > > > As far as I understand, it won't require us to do any additional
> > > changes
> > > > in
> > > > > Ignite, but will make users think about using a proper
> DeploymentSpi.
> > > > > Please correct me, if I'm wrong.
> > > > > It would be good, though, to add some examples on service
> > redeployment,
> > > > > when implementation class changes.
> > > > >
> > > > > Denis
> > > > >
> > > > > чт, 5 апр. 2018 г. в 2:33, Valentin Kulichenko <
> > > > > [hidden email]>:
> > > > >
> > > > > > I don't think peer class loading is even possible for services. I
> > > > believe
> > > > > > we should reuse DeploymentSpi [1] for versioning.
> > > > > >
> > > > > > [1] https://apacheignite.readme.io/docs/deployment-spi
> > > > > >
> > > > > > -Val
> > > > > >
> > > > > > On Wed, Apr 4, 2018 at 12:52 PM, Denis Magda <
> [hidden email]>
> > > > > wrote:
> > > > > >
> > > > > > > Sorry, that was me who renamed the IEP to "Oil Change in
> Service
> > > > Grid".
> > > > > > Was
> > > > > > > writing this email after the renaming. Like that title more
> > because
> > > > > it's
> > > > > > > fun and highlights what we're intended to do - cleaning of our
> > > > service
> > > > > > grid
> > > > > > > engine and powering it up with new "liquid" (new communication
> > and
> > > > > > > deployment approach not available before).
> > > > > > >
> > > > > > > Denis
> > > > > > >
> > > > > > >
> > > > > > > > This message contains serialized service instance and its
> > > > > > configuration.
> > > > > > > > It is delivered to the coordinator node first, that
> calculates
> > > the
> > > > > > > service
> > > > > > > > deployment assignments and adds this information to the
> > message.
> > > > > > >
> > > > > > >
> > > > > > > I would consider using a NodeFilter first to decide where a
> > service
> > > > can
> > > > > > be
> > > > > > > potentially deployed.  Otherwise, we would require service
> > classes
> > > to
> > > > > be
> > > > > > on
> > > > > > > every node (every node might become a coordinator) which is not
> > the
> > > > > > desired
> > > > > > > requirement.
> > > > > > >
> > > > > > >
> > > > > > > As for the peer-class-loading, I would backup up Dmitriy here.
> > > Let's
> > > > at
> > > > > > > least not to focus on this task for now. We should design
> > services
> > > > > > > versioning in the right way first and support it.
> > > > > > >
> > > > > > > --
> > > > > > > Denis
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Apr 4, 2018 at 12:20 PM, Dmitriy Setrakyan <
> > > > > > [hidden email]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Here is the correct link:
> > > > > > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > > > > > > > 17%3A+Oil+Change+in+Service+Grid
> > > > > > > >
> > > > > > > > I have looked at the tickets there, and I believe that we
> > should
> > > > not
> > > > > > > > support peer-deployment for services. It is very hard and I
> do
> > > not
> > > > > > think
> > > > > > > we
> > > > > > > > should even try.
> > > > > > > >
> > > > > > > > I am proposing closing this ticket as Won't Fix -
> > > > > > > > https://issues.apache.org/jira/browse/IGNITE-975
> > > > > > > >
> > > > > > > > D.
> > > > > > > >
> > > > > > > > On Wed, Apr 4, 2018 at 5:39 AM, Denis Mekhanikov <
> > > > > > [hidden email]>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Vyacheslav,
> > > > > > > > >
> > > > > > > > > I've just posted my first draft of the IEP:
> > > > > > > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > > > > > > > 17%3A+Service+grid+
> > > > > > > > > improvements
> > > > > > > > > It's not finished yet, but you can get the idea from it.
> > > > > > > > > If you have some thoughts on your mind, please let me know,
> > > I'll
> > > > > add
> > > > > > > them
> > > > > > > > > to the IEP.
> > > > > > > > >
> > > > > > > > > Denis
> > > > > > > > >
> > > > > > > > > ср, 4 апр. 2018 г. в 13:09, Vyacheslav Daradur <
> > > > > [hidden email]
> > > > > > >:
> > > > > > > > >
> > > > > > > > > > Denis, thanks for the link.
> > > > > > > > > >
> > > > > > > > > > I looked through the task and I think that understand
> your
> > > > > redesign
> > > > > > > > point
> > > > > > > > > > now.
> > > > > > > > > >
> > > > > > > > > > Do you have a clear plan or IEP for the whole redesign?
> > > > > > > > > >
> > > > > > > > > > I'm interested in this component and I'd like to take
> part
> > in
> > > > the
> > > > > > > > > > development.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Apr 2, 2018 at 2:55 PM, Denis Mekhanikov <
> > > > > > > > [hidden email]>
> > > > > > > > > > wrote:
> > > > > > > > > > > Vyacheslav,
> > > > > > > > > > >
> > > > > > > > > > > Service deployment design, based on replicated utility
> > > cache
> > > > > has
> > > > > > > > proven
> > > > > > > > > > to
> > > > > > > > > > > be unstable and deadlock-prone.
> > > > > > > > > > > You can find a list of JIRA issues, connected to it, in
> > my
> > > > > > previous
> > > > > > > > > > letter.
> > > > > > > > > > >
> > > > > > > > > > > The intention behind it is similar to the binary
> metadata
> > > > > > redesign,
> > > > > > > > > that
> > > > > > > > > > > happened in the following ticket: IGNITE-4157
> > > > > > > > > > > <https://issues.apache.org/jira/browse/IGNITE-4157>
> > > > > > > > > > > This change in service deployment procedure will
> > eliminate
> > > > need
> > > > > > for
> > > > > > > > > > another
> > > > > > > > > > > internal replicated cache
> > > > > > > > > > > and make service deployment more reliable on unstable
> > > > topology.
> > > > > > > > > > >
> > > > > > > > > > > Denis
> > > > > > > > > > >
> > > > > > > > > > > вт, 27 мар. 2018 г. в 23:21, Vyacheslav Daradur <
> > > > > > > [hidden email]
> > > > > > > > >:
> > > > > > > > > > >
> > > > > > > > > > >> Hi, Denis Mekhanikov!
> > > > > > > > > > >>
> > > > > > > > > > >> As far as I know, Ignite services are based on
> > IgniteCache
> > > > and
> > > > > > we
> > > > > > > > have
> > > > > > > > > > >> all its features. We can use listeners or continuous
> > > queries
> > > > > for
> > > > > > > > > > >> deployment synchronizations.
> > > > > > > > > > >>
> > > > > > > > > > >> Why do you want using the discovery layer for that?
> > > > > > > > > > >>
> > > > > > > > > > >> One more thing: we can use baseline approach for
> > services,
> > > > > that
> > > > > > > > means
> > > > > > > > > > >> *IgniteService.deploy()* returns ready to work service
> > > after
> > > > > > > > > > >> deployment on baseline nodes and deploy to other nodes
> > on
> > > > > > demand,
> > > > > > > > for
> > > > > > > > > > >> example when deployed service's loading will be hight.
> > > > > > > > > > >>
> > > > > > > > > > >> About versioning, maybe there is sense to extend
> public
> > > API:
> > > > > > > > > > >> IgniteServices.service(name, *version*)?
> > > > > > > > > > >>
> > > > > > > > > > >> At first deployment, we can compute service's hashcode
> > > (just
> > > > > for
> > > > > > > an
> > > > > > > > > > >> example) and store it, after new deployment request
> for
> > > > > services
> > > > > > > > with
> > > > > > > > > > >> an existing name we will compute new service's
> hashcode
> > > and
> > > > > > > compare
> > > > > > > > > > >> them if they have different hashcodes that we will
> > deploy
> > > > new
> > > > > > > > service
> > > > > > > > > > >> as service with a different version.
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >> On Fri, Mar 23, 2018 at 10:03 PM, Denis Magda <
> > > > > > [hidden email]>
> > > > > > > > > > wrote:
> > > > > > > > > > >> > Denis,
> > > > > > > > > > >> >
> > > > > > > > > > >> > Thanks for the extensive analysis. There is a vast
> > room
> > > > for
> > > > > > > > > > optimizations
> > > > > > > > > > >> > on the service grid side.
> > > > > > > > > > >> >
> > > > > > > > > > >> > Yakov, Sam, Alex G.,
> > > > > > > > > > >> >
> > > > > > > > > > >> > How do you like the idea of the usage of discovery
> > > > protocol
> > > > > > for
> > > > > > > > the
> > > > > > > > > > >> service
> > > > > > > > > > >> > grid system messages exchange? Any pitfalls?
> > > > > > > > > > >> >
> > > > > > > > > > >> >
> > > > > > > > > > >> > --
> > > > > > > > > > >> > Denis
> > > > > > > > > > >> >
> > > > > > > > > > >> >
> > > > > > > > > > >> > On Fri, Mar 23, 2018 at 8:01 AM, Denis Mekhanikov <
> > > > > > > > > > [hidden email]
> > > > > > > > > > >> >
> > > > > > > > > > >> > wrote:
> > > > > > > > > > >> >
> > > > > > > > > > >> >> Igniters,
> > > > > > > > > > >> >>
> > > > > > > > > > >> >> I'd like to start a discussion on Ignite service
> grid
> > > > > > redesign.
> > > > > > > > > > >> >> We have a number of problems in our current
> > > architecture,
> > > > > > that
> > > > > > > > have
> > > > > > > > > > to
> > > > > > > > > > >> be
> > > > > > > > > > >> >> addressed.
> > > > > > > > > > >> >>
> > > > > > > > > > >> >> Here are the most severe ones:
> > > > > > > > > > >> >>
> > > > > > > > > > >> >> One of them is lack of guarantee, that service is
> > > > > > successfully
> > > > > > > > > > deployed
> > > > > > > > > > >> and
> > > > > > > > > > >> >> ready for work by the time, when
> > > > *IgniteService.deploy*()*
> > > > > > > > methods
> > > > > > > > > > >> return.
> > > > > > > > > > >> >> Furthermore, if an exception is thrown from
> > > > *Service.init()
> > > > > > > > > *method,
> > > > > > > > > > >> then
> > > > > > > > > > >> >> the deploying side is not able to receive it, or
> even
> > > > > > > understand,
> > > > > > > > > > that
> > > > > > > > > > >> >> service is in unusable state.
> > > > > > > > > > >> >> So, you may end up in such situation, when you
> > > deployed a
> > > > > > > service
> > > > > > > > > > >> without
> > > > > > > > > > >> >> receiving any errors, then called a service's
> method,
> > > and
> > > > > > hung
> > > > > > > > > > >> indefinitely
> > > > > > > > > > >> >> on this invocation.
> > > > > > > > > > >> >> JIRA ticket:
> > > > > > https://issues.apache.org/jira/browse/IGNITE-3392
> > > > > > > > > > >> >>
> > > > > > > > > > >> >> Another problem is locking during service
> deployment
> > on
> > > > > > > unstable
> > > > > > > > > > >> topology.
> > > > > > > > > > >> >> This issue is caused by missing updates in
> continuous
> > > > query
> > > > > > > > > > listeners on
> > > > > > > > > > >> >> the internal cache.
> > > > > > > > > > >> >> It is hard to reproduce, but it happens sometimes.
> We
> > > > > > shouldn't
> > > > > > > > > allow
> > > > > > > > > > >> such
> > > > > > > > > > >> >> possibility, that deployment methods hang without
> > > saying
> > > > > > > > anything.
> > > > > > > > > > >> >> JIRA ticket:
> > > > > > https://issues.apache.org/jira/browse/IGNITE-6259
> > > > > > > > > > >> >>
> > > > > > > > > > >> >> I think, we should change the deployment procedure
> to
> > > > make
> > > > > it
> > > > > > > > more
> > > > > > > > > > >> >> reliable.
> > > > > > > > > > >> >> Moving from operating over internal replicated
> > service
> > > > > cache
> > > > > > to
> > > > > > > > > > sending
> > > > > > > > > > >> >> custom discovery events seems to be a good idea.
> > > > > > > > > > >> >> Service deployment may trigger a discovery event,
> > that
> > > > will
> > > > > > > make
> > > > > > > > > > chosen
> > > > > > > > > > >> >> nodes deploy the service, and the same event will
> > > notify
> > > > > > other
> > > > > > > > > nodes
> > > > > > > > > > >> about
> > > > > > > > > > >> >> the deployed service instances.
> > > > > > > > > > >> >> It will eliminate the need for distributed
> > transactions
> > > > on
> > > > > > the
> > > > > > > > > > internal
> > > > > > > > > > >> >> replicated system cache, and make the service
> > > deployment
> > > > > > > protocol
> > > > > > > > > > more
> > > > > > > > > > >> >> transparent.
> > > > > > > > > > >> >>
> > > > > > > > > > >> >> There are a few points, that should be taken into
> > > account
> > > > > > > though.
> > > > > > > > > > >> >>
> > > > > > > > > > >> >> First of all, we can't wait for services to be
> > deployed
> > > > and
> > > > > > > > > > initialised
> > > > > > > > > > >> in
> > > > > > > > > > >> >> the discovery thread.
> > > > > > > > > > >> >> So, we need to make notification about service
> > > deployment
> > > > > > > result
> > > > > > > > > > >> >> asynchronous, presumably over communication
> protocol.
> > > > > > > > > > >> >> I can think of a procedure similar to the current
> > > > exchange
> > > > > > > > > protocol,
> > > > > > > > > > >> when
> > > > > > > > > > >> >> service deployment is initialised with an initial
> > > > discovery
> > > > > > > > > message,
> > > > > > > > > > >> >> followed by asynchronous notifications from the
> > hosting
> > > > > > servers
> > > > > > > > > over
> > > > > > > > > > >> >> communication. And finally, one more discovery
> > message
> > > > will
> > > > > > > > notify
> > > > > > > > > > all
> > > > > > > > > > >> >> nodes about the service deployment result and
> > location
> > > of
> > > > > the
> > > > > > > > > > deployed
> > > > > > > > > > >> >> service instances. Coordinator will be responsible
> > for
> > > > > > > collecting
> > > > > > > > > of
> > > > > > > > > > the
> > > > > > > > > > >> >> deployment results in this scheme.
> > > > > > > > > > >> >>
> > > > > > > > > > >> >> Another problem is failover in case, when some
> nodes
> > > fail
> > > > > > > during
> > > > > > > > > > >> deployment
> > > > > > > > > > >> >> or further work.
> > > > > > > > > > >> >> The following cases should be handled:
> > > > > > > > > > >> >>
> > > > > > > > > > >> >>    1. coordinator failure during deployment;
> > > > > > > > > > >> >>    2. failure of nodes, that were chosen to host
> the
> > > > > service,
> > > > > > > > > during
> > > > > > > > > > >> >>    deployment;
> > > > > > > > > > >> >>    3. failure of nodes, that contain deployed
> > services,
> > > > > after
> > > > > > > the
> > > > > > > > > > >> >>    deployment.
> > > > > > > > > > >> >>
> > > > > > > > > > >> >> The first case may be resolved by either
> continuation
> > > of
> > > > > > > > deployment
> > > > > > > > > > >> with a
> > > > > > > > > > >> >> new coordinator, or by cancelling it.
> > > > > > > > > > >> >> The second case will require another node to be
> > chosen
> > > > and
> > > > > > > > > notified.
> > > > > > > > > > >> Maybe
> > > > > > > > > > >> >> another discovery message will be needed.
> > > > > > > > > > >> >> The third case will require redeployment, so
> > > coordinator
> > > > > > should
> > > > > > > > > track
> > > > > > > > > > >> >> topology changes and redeploy failed services.
> > > > > > > > > > >> >>
> > > > > > > > > > >> >> Another good improvement would be service
> versioning.
> > > > This
> > > > > > > matter
> > > > > > > > > was
> > > > > > > > > > >> >> already discussed in another thread:
> > > > > > > > > > >> >>
> > > > > > > > > > >>
> > > > > > > > > > http://apache-ignite-developers.2346864.n4.nabble.
> > > > > > > > > com/Service-versioning-
> > > > > > > > > > >> >> td20858.html
> > > > > > > > > > >> >> Let's resume this discussion and state the final
> > > decision
> > > > > > here.
> > > > > > > > > > >> >> This feature is closely connected to peer class
> > > loading,
> > > > > > which
> > > > > > > is
> > > > > > > > > not
> > > > > > > > > > >> >> working for services currently.
> > > > > > > > > > >> >> So, service versioning should be implemented along
> > with
> > > > > peer
> > > > > > > > class
> > > > > > > > > > >> loading.
> > > > > > > > > > >> >> JIRA ticket for versioning:
> > > > > > > > > > >> >> https://issues.apache.org/jira/browse/IGNITE-6069
> > > > > > > > > > >> >> Peer class loading: https://issues.apache.org/
> > > > > > > > > jira/browse/IGNITE-975
> > > > > > > > > > >> >>
> > > > > > > > > > >> >> Please share your thoughts. Constructive criticism
> is
> > > > > highly
> > > > > > > > > > >> appreciated.
> > > > > > > > > > >> >>
> > > > > > > > > > >> >> Denis
> > > > > > > > > > >> >>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >> --
> > > > > > > > > > >> Best Regards, Vyacheslav D.
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best Regards, Vyacheslav D.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Service grid redesign

dsetrakyan
On Fri, Apr 6, 2018 at 9:13 AM, Dmitry Pavlov <[hidden email]> wrote:

> Hi Igniters,
>
> I like automatic redeploy which can be disabled by config if user wants to
> control this process. What do you think?
>

I do not think we should have anything automatic when it comes to
deployment, everything should be explicit. However, if we use the
deployment SPI, then a user should be able to do "hot" redeploy, where a
new service will be deployed if the user drops an updated jar.

We should not create anything new here. Ignite already has a deployment SPI
and it already works in a certain way. Let's not change it.

D.
Reply | Threaded
Open this post in threaded view
|

Re: Service grid redesign

Valentin Kulichenko
Yes, the class deployment itself has to be explicit. I.e., there has to be
a manual step where user updates the class, and the exact step required
would depend on DeploymentSpi implementation. But then Ignite takes care of
everything else - service redeployment and restart is automatic.

Dmitriy Pavlov, all this is going to be disabled if DeploymentSpi is not
configured. In this case service class definitions have to be deployed on
local classpath and can't be updated in runtime. Just like it works right
now.

-Val

On Fri, Apr 6, 2018 at 10:20 AM, Dmitriy Setrakyan <[hidden email]>
wrote:

> On Fri, Apr 6, 2018 at 9:13 AM, Dmitry Pavlov <[hidden email]>
> wrote:
>
> > Hi Igniters,
> >
> > I like automatic redeploy which can be disabled by config if user wants
> to
> > control this process. What do you think?
> >
>
> I do not think we should have anything automatic when it comes to
> deployment, everything should be explicit. However, if we use the
> deployment SPI, then a user should be able to do "hot" redeploy, where a
> new service will be deployed if the user drops an updated jar.
>
> We should not create anything new here. Ignite already has a deployment SPI
> and it already works in a certain way. Let's not change it.
>
> D.
>
Reply | Threaded
Open this post in threaded view
|

Re: Service grid redesign

Denis Mekhanikov
Val,

Sounds reasonable. I just think, that user should have some way to know,
that new version of a service class was deployed.
One way to do it is to listen to *EVT_CLASS_DEPLOYED. *I'm not sure,
whether it is triggered on class redeployment, though. If not, then another
event type should be added.

I don't think, that a lot of people will implement their own *DeploymentSpi*-s,
so we should make work with *UriDeploymentSpi* as comfortable as possible.

Denis

пт, 6 апр. 2018 г. в 23:40, Valentin Kulichenko <
[hidden email]>:

> Yes, the class deployment itself has to be explicit. I.e., there has to be
> a manual step where user updates the class, and the exact step required
> would depend on DeploymentSpi implementation. But then Ignite takes care of
> everything else - service redeployment and restart is automatic.
>
> Dmitriy Pavlov, all this is going to be disabled if DeploymentSpi is not
> configured. In this case service class definitions have to be deployed on
> local classpath and can't be updated in runtime. Just like it works right
> now.
>
> -Val
>
> On Fri, Apr 6, 2018 at 10:20 AM, Dmitriy Setrakyan <[hidden email]>
> wrote:
>
> > On Fri, Apr 6, 2018 at 9:13 AM, Dmitry Pavlov <[hidden email]>
> > wrote:
> >
> > > Hi Igniters,
> > >
> > > I like automatic redeploy which can be disabled by config if user wants
> > to
> > > control this process. What do you think?
> > >
> >
> > I do not think we should have anything automatic when it comes to
> > deployment, everything should be explicit. However, if we use the
> > deployment SPI, then a user should be able to do "hot" redeploy, where a
> > new service will be deployed if the user drops an updated jar.
> >
> > We should not create anything new here. Ignite already has a deployment
> SPI
> > and it already works in a certain way. Let's not change it.
> >
> > D.
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Service grid redesign

Denis Mekhanikov
Another question, that I would like to discuss is whether services should
be preserved on cluster restarts.

Currently it depends on persistence configuration. If persistence for any
data region is enabled, then services will be persisted as well. This is a
pretty strange way of configuring this behaviour.
I'm not sure, if anybody relies on this functionality right now. Should we
support it at all? If yes, should we make it configurable?

Denis

пн, 9 апр. 2018 г. в 19:27, Denis Mekhanikov <[hidden email]>:

> Val,
>
> Sounds reasonable. I just think, that user should have some way to know,
> that new version of a service class was deployed.
> One way to do it is to listen to *EVT_CLASS_DEPLOYED. *I'm not sure,
> whether it is triggered on class redeployment, though. If not, then another
> event type should be added.
>
> I don't think, that a lot of people will implement their own
> *DeploymentSpi*-s, so we should make work with *UriDeploymentSpi* as
> comfortable as possible.
>
> Denis
>
> пт, 6 апр. 2018 г. в 23:40, Valentin Kulichenko <
> [hidden email]>:
>
>> Yes, the class deployment itself has to be explicit. I.e., there has to be
>> a manual step where user updates the class, and the exact step required
>> would depend on DeploymentSpi implementation. But then Ignite takes care
>> of
>> everything else - service redeployment and restart is automatic.
>>
>> Dmitriy Pavlov, all this is going to be disabled if DeploymentSpi is not
>> configured. In this case service class definitions have to be deployed on
>> local classpath and can't be updated in runtime. Just like it works right
>> now.
>>
>> -Val
>>
>> On Fri, Apr 6, 2018 at 10:20 AM, Dmitriy Setrakyan <[hidden email]
>> >
>> wrote:
>>
>> > On Fri, Apr 6, 2018 at 9:13 AM, Dmitry Pavlov <[hidden email]>
>> > wrote:
>> >
>> > > Hi Igniters,
>> > >
>> > > I like automatic redeploy which can be disabled by config if user
>> wants
>> > to
>> > > control this process. What do you think?
>> > >
>> >
>> > I do not think we should have anything automatic when it comes to
>> > deployment, everything should be explicit. However, if we use the
>> > deployment SPI, then a user should be able to do "hot" redeploy, where a
>> > new service will be deployed if the user drops an updated jar.
>> >
>> > We should not create anything new here. Ignite already has a deployment
>> SPI
>> > and it already works in a certain way. Let's not change it.
>> >
>> > D.
>> >
>>
>
12345