Potential OOM while iterating over query cursor. Review needed.

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Potential OOM while iterating over query cursor. Review needed.

Andrew Mashenkov
Hi Igniters,

There is an issue IGNITE-8892 [1] related to OOM during distributed query
execution.
This issue is not limited with ScanQuery usage and looks like affected all
query types.

The use case is quite simple. 1 server and 1 client.
Client starts scan query with default flags and iterate over cursor.
If whole query result is not fit to memory - JVM will crashed with OOM,
but it is not expected as client takes entries one by one and throw out
them immediately.
Reproducer is attached to the ticket.

Same query works fine if query starts on server. Seems, we have no
DistributedQueryFuture in that case and all works fine.

I've found GridCacheDistributedQueryFuture collects all entries and try to
return the collection via onDone().
This behaviour turn on with a flag 'keepAll' which is true by default.
Iterating over cache via cache.iterator() has no OOM issues as we set
keepAll flag to false.

Why we have keepAll=true by default as seems noone expects future.get()
will return any data and all queries works through queue in paging mode?
Will it be ok to get rid of 'allCol' and keepAll flag at all?

I've made a PR and TC looks fine.
Could someone review it, please?

[1] https://issues.apache.org/jira/browse/IGNITE-8892


--
Best regards,
Andrey V. Mashenkov
Reply | Threaded
Open this post in threaded view
|

Re: Potential OOM while iterating over query cursor. Review needed.

Alexey Goncharuk
Folks,

Bumping up the discussion as it is hitting one of the Ignite users.

The change seams reasonable to me, but it is a breaking change and may
affect existing users. Would the community be ok if we change the
QueryCursor#getAll method for scan queries? If not, we should expose the
keepAll() flag to the public API.

пт, 29 июн. 2018 г. в 11:37, Andrey Mashenkov <[hidden email]>:

> Hi Igniters,
>
> There is an issue IGNITE-8892 [1] related to OOM during distributed query
> execution.
> This issue is not limited with ScanQuery usage and looks like affected all
> query types.
>
> The use case is quite simple. 1 server and 1 client.
> Client starts scan query with default flags and iterate over cursor.
> If whole query result is not fit to memory - JVM will crashed with OOM,
> but it is not expected as client takes entries one by one and throw out
> them immediately.
> Reproducer is attached to the ticket.
>
> Same query works fine if query starts on server. Seems, we have no
> DistributedQueryFuture in that case and all works fine.
>
> I've found GridCacheDistributedQueryFuture collects all entries and try to
> return the collection via onDone().
> This behaviour turn on with a flag 'keepAll' which is true by default.
> Iterating over cache via cache.iterator() has no OOM issues as we set
> keepAll flag to false.
>
> Why we have keepAll=true by default as seems noone expects future.get()
> will return any data and all queries works through queue in paging mode?
> Will it be ok to get rid of 'allCol' and keepAll flag at all?
>
> I've made a PR and TC looks fine.
> Could someone review it, please?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-8892
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>
Reply | Threaded
Open this post in threaded view
|

Re: Potential OOM while iterating over query cursor. Review needed.

Andrew Mashenkov
Alexey,

I saw no issues on TC with this change, and change affect only private API.
If you think it can break smth, then we can mark keepAll flag as deprecated
to be deleted in 3.0 and change default value to true.
I doubt this flag is useful for Ignite internals, and moreover user always
can wrap query and implement same logic if he will really need such
behavior.
So, I vote for removing useless code if it is of course.

ср, 11 июл. 2018 г., 13:26 Alexey Goncharuk <[hidden email]>:

> Folks,
>
> Bumping up the discussion as it is hitting one of the Ignite users.
>
> The change seams reasonable to me, but it is a breaking change and may
> affect existing users. Would the community be ok if we change the
> QueryCursor#getAll method for scan queries? If not, we should expose the
> keepAll() flag to the public API.
>
> пт, 29 июн. 2018 г. в 11:37, Andrey Mashenkov <[hidden email]
> >:
>
> > Hi Igniters,
> >
> > There is an issue IGNITE-8892 [1] related to OOM during distributed query
> > execution.
> > This issue is not limited with ScanQuery usage and looks like affected
> all
> > query types.
> >
> > The use case is quite simple. 1 server and 1 client.
> > Client starts scan query with default flags and iterate over cursor.
> > If whole query result is not fit to memory - JVM will crashed with OOM,
> > but it is not expected as client takes entries one by one and throw out
> > them immediately.
> > Reproducer is attached to the ticket.
> >
> > Same query works fine if query starts on server. Seems, we have no
> > DistributedQueryFuture in that case and all works fine.
> >
> > I've found GridCacheDistributedQueryFuture collects all entries and try
> to
> > return the collection via onDone().
> > This behaviour turn on with a flag 'keepAll' which is true by default.
> > Iterating over cache via cache.iterator() has no OOM issues as we set
> > keepAll flag to false.
> >
> > Why we have keepAll=true by default as seems noone expects future.get()
> > will return any data and all queries works through queue in paging mode?
> > Will it be ok to get rid of 'allCol' and keepAll flag at all?
> >
> > I've made a PR and TC looks fine.
> > Could someone review it, please?
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-8892
> >
> >
> > --
> > Best regards,
> > Andrey V. Mashenkov
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Potential OOM while iterating over query cursor. Review needed.

Alexey Goncharuk
Andrey,

Correct me if I am wrong, but my impression was that after the change
cursor#getAll() will return only the last page of the result, not the whole
collection. If public API method semantics is preserved, then no objections
from my side.

ср, 11 июл. 2018 г. в 15:18, Andrey Mashenkov <[hidden email]>:

> Alexey,
>
> I saw no issues on TC with this change, and change affect only private API.
> If you think it can break smth, then we can mark keepAll flag as deprecated
> to be deleted in 3.0 and change default value to true.
> I doubt this flag is useful for Ignite internals, and moreover user always
> can wrap query and implement same logic if he will really need such
> behavior.
> So, I vote for removing useless code if it is of course.
>
> ср, 11 июл. 2018 г., 13:26 Alexey Goncharuk <[hidden email]>:
>
> > Folks,
> >
> > Bumping up the discussion as it is hitting one of the Ignite users.
> >
> > The change seams reasonable to me, but it is a breaking change and may
> > affect existing users. Would the community be ok if we change the
> > QueryCursor#getAll method for scan queries? If not, we should expose the
> > keepAll() flag to the public API.
> >
> > пт, 29 июн. 2018 г. в 11:37, Andrey Mashenkov <
> [hidden email]
> > >:
> >
> > > Hi Igniters,
> > >
> > > There is an issue IGNITE-8892 [1] related to OOM during distributed
> query
> > > execution.
> > > This issue is not limited with ScanQuery usage and looks like affected
> > all
> > > query types.
> > >
> > > The use case is quite simple. 1 server and 1 client.
> > > Client starts scan query with default flags and iterate over cursor.
> > > If whole query result is not fit to memory - JVM will crashed with OOM,
> > > but it is not expected as client takes entries one by one and throw out
> > > them immediately.
> > > Reproducer is attached to the ticket.
> > >
> > > Same query works fine if query starts on server. Seems, we have no
> > > DistributedQueryFuture in that case and all works fine.
> > >
> > > I've found GridCacheDistributedQueryFuture collects all entries and try
> > to
> > > return the collection via onDone().
> > > This behaviour turn on with a flag 'keepAll' which is true by default.
> > > Iterating over cache via cache.iterator() has no OOM issues as we set
> > > keepAll flag to false.
> > >
> > > Why we have keepAll=true by default as seems noone expects future.get()
> > > will return any data and all queries works through queue in paging
> mode?
> > > Will it be ok to get rid of 'allCol' and keepAll flag at all?
> > >
> > > I've made a PR and TC looks fine.
> > > Could someone review it, please?
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-8892
> > >
> > >
> > > --
> > > Best regards,
> > > Andrey V. Mashenkov
> > >
> >
>