Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Explain how MaxItems is applied in more detail #3151

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 85 additions & 8 deletions docs/source/topics/paginators.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Botocore Paginators
Botocore paginators
===================

Some AWS operations return results that are incomplete and require subsequent
Expand All @@ -13,15 +13,17 @@ results.
process of iterating over an entire result set of a truncated API operation.


Creating Paginators
Creating paginators
-------------------

Paginators are created via the ``get_paginator()`` method of a botocore
client. The ``get_paginator()`` method accepts an operation name and returns
a reusable ``Paginator`` object. You then call the ``paginate`` method of the
Paginator, passing in any relevant operation parameters to apply to the
underlying API operation. The ``paginate`` method then returns an iterable
``PageIterator``::
``PageIterator``:

.. code-block:: python

import botocore.session

Expand All @@ -39,13 +41,15 @@ underlying API operation. The ``paginate`` method then returns an iterable
print(page['Contents'])


Customizing Page Iterators
Customizing page iterators
~~~~~~~~~~~~~~~~~~~~~~~~~~

You must call the ``paginate`` method of a Paginator in order to iterate over
the pages of API operation results. The ``paginate`` method accepts a
``PaginationConfig`` named argument that can be used to customize the
pagination::
pagination.

.. code-block:: python

paginator = client.get_paginator('list_objects')
page_iterator = paginator.paginate(Bucket='my-bucket',
Expand Down Expand Up @@ -73,11 +77,13 @@ pagination::
Filtering results
-----------------

Many Paginators can be filtered server-side with options that are passed
Many paginators can be filtered server-side with options that are passed
through to each underlying API call. For example,
:py:meth:`S3.Paginator.list_objects.paginate` accepts a ``Prefix`` parameter
used to filter the paginated results by prefix server-side before sending them
to the client::
to the client.

.. code-block:: python

import botocore.session
session = botocore.session.get_session()
Expand All @@ -90,7 +96,78 @@ to the client::
print(page['Contents'])


Filtering Results with JMESPath
Prefixes, delimiters, and `MaxItems`
------------------------------------

When using ``list_objects`` with a delimiter and the ``MaxItems`` option, the
``MaxItems`` limit only applies to the objects matched and returned in the
``Contents`` list. This means that while the total number of objects
enumerated in ``Contents`` will be no greater than ``MaxItems`` items, there
may be values in ``CommonPrefixes`` beyond that limit.

For example, picture a scenario in which a bucket contains 20,000 objects,
each with a different prefix, using the slash character ("/") as a delimiter:

* ``bucket-name/prefix1/key1``
* ``bucket-name/prefix2/key2``
* ...
* ``bucket-name/prefixN/keyN``

With that in mind, consider what happens when the following code runs:

.. code-block:: python

num_prefixes = 0
num_keys = 0

session = botocore.session.get_session()
s3 = session.client_create('s3')
paginator = s3.get_paginator('list_objects_v2')

for result in paginator.paginate(
Bucket='bucket-name', Delimiter='/',
PaginationConfig={'MaxItems': 2000}):
for prefix in result.get('CommonPrefixes', []):
num_prefixes += 1
for item in result.get('Contents', []):
num_keys += 1

This code iterates over the 20,000 objects, limiting the total number of objects
listed to 2,000. Because the results include the 20,000 common prefixes, this
paginator runs far longer than expected, since it still processes all 20,000
common prefixes despite the value of ``MaxItems``.

To process a maximum number of total items, track the total number of results
and when it reaches the limit, break out of the paginator's loop.

.. code-block:: python

num_prefixes = 0
num_keys = 0

session = botocore.session.get_session()
s3 = session.create_client('s3')
paginator = s3.get_paginator('list_objects_v2')

for result in paginator.paginate(
Bucket='bucket-name', Delimiter='/'):
prefixes = result.get('CommonPrefixes', [])
keys = result.get('Contents', [])

num_prefixes += len(prefixes)
num_keys += len(keys)
if num_prefixes + num_keys > 2000:
break

for prefix in prefixes:
print(f"Prefix: {prefix['Prefix']}")
for key in keys:
print(f"Key: {key['Key']}")

This will stop pagination when the combined size of the ``CommonPrefixes`` list and the ``Contents`` list reaches 2,000.


Filtering results with JMESPath
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

`JMESPath <http://jmespath.org>`_ is a query language for JSON that can be used
Expand Down
Loading