boto · shepazon · Apr 1, 2024 · Apr 3, 2024
diff --git a/docs/source/topics/paginators.rst b/docs/source/topics/paginators.rst
@@ -1,4 +1,4 @@
-Botocore Paginators
+Botocore paginators
 ===================
 
 Some AWS operations return results that are incomplete and require subsequent
@@ -13,15 +13,17 @@ results.
 process of iterating over an entire result set of a truncated API operation.
 
 
-Creating Paginators
+Creating paginators
 -------------------
 
 Paginators are created via the ``get_paginator()`` method of a botocore
 client. The ``get_paginator()`` method accepts an operation name and returns
 a reusable ``Paginator`` object. You then call the ``paginate`` method of the
 Paginator, passing in any relevant operation parameters to apply to the
 underlying API operation. The ``paginate`` method then returns an iterable
-``PageIterator``::
+``PageIterator``:
+
+.. code-block:: python
 
     import botocore.session
 
@@ -39,13 +41,15 @@ underlying API operation. The ``paginate`` method then returns an iterable
         print(page['Contents'])
 
 
-Customizing Page Iterators
+Customizing page iterators
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 You must call the ``paginate`` method of a Paginator in order to iterate over
 the pages of API operation results. The ``paginate`` method accepts a
 ``PaginationConfig`` named argument that can be used to customize the
-pagination::
+pagination.
+
+.. code-block:: python
 
     paginator = client.get_paginator('list_objects')
     page_iterator = paginator.paginate(Bucket='my-bucket',
@@ -73,11 +77,13 @@ pagination::
 Filtering results
 -----------------
 
-Many Paginators can be filtered server-side with options that are passed
+Many paginators can be filtered server-side with options that are passed
 through to each underlying API call. For example,
 :py:meth:`S3.Paginator.list_objects.paginate` accepts a ``Prefix`` parameter
 used to filter the paginated results by prefix server-side before sending them
-to the client::
+to the client.
+
+.. code-block:: python
 
     import botocore.session
     session = botocore.session.get_session()
@@ -90,7 +96,78 @@ to the client::
         print(page['Contents'])
 
 
-Filtering Results with JMESPath
+Prefixes, delimiters, and `MaxItems`
+------------------------------------
+
+When using ``list_objects`` with a delimiter and the ``MaxItems`` option, the
+``MaxItems`` limit only applies to the objects matched and returned in the
+``Contents`` list. This means that while the total number of objects
+enumerated in ``Contents`` will be no greater than ``MaxItems`` items, there
+may be values in ``CommonPrefixes`` beyond that limit.
+
+For example, picture a scenario in which a bucket contains 20,000 objects,
+each with a different prefix, using the slash character ("/") as a delimiter:
+
+* ``bucket-name/prefix1/key1``
+* ``bucket-name/prefix2/key2``
+* ...
+* ``bucket-name/prefixN/keyN``
+
+With that in mind, consider what happens when the following code runs:
+
+.. code-block:: python
+
+    num_prefixes = 0
+    num_keys = 0
+
+    session = botocore.session.get_session()
+    s3 = session.client_create('s3')
+    paginator = s3.get_paginator('list_objects_v2')
+
+    for result in paginator.paginate(
+                Bucket='bucket-name', Delimiter='/',
+                PaginationConfig={'MaxItems': 2000}):
+        for prefix in result.get('CommonPrefixes', []):
+            num_prefixes += 1
+        for item in result.get('Contents', []):
+            num_keys += 1
+
+This code iterates over the 20,000 objects, limiting the total number of objects
+listed to 2,000. Because the results include the 20,000 common prefixes, this
+paginator runs far longer than expected, since it still processes all 20,000
+common prefixes despite the value of ``MaxItems``.
+
+To process a maximum number of total items, track the total number of results
+and when it reaches the limit, break out of the paginator's loop.
+
+.. code-block:: python
+
+    num_prefixes = 0
+    num_keys = 0
+
+    session = botocore.session.get_session()
+    s3 = session.create_client('s3')
+    paginator = s3.get_paginator('list_objects_v2')
+
+    for result in paginator.paginate(
+                Bucket='bucket-name', Delimiter='/'):
+        prefixes = result.get('CommonPrefixes', [])
+        keys = result.get('Contents', [])
+
+        num_prefixes += len(prefixes)
+        num_keys += len(keys)
+        if num_prefixes + num_keys > 2000:
+            break
+
+        for prefix in prefixes:
+            print(f"Prefix: {prefix['Prefix']}")
+        for key in keys:
+            print(f"Key:    {key['Key']}")
+
+This will stop pagination when the combined size of the ``CommonPrefixes`` list and the ``Contents`` list reaches 2,000.
+
+
+Filtering results with JMESPath
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 `JMESPath <http://jmespath.org>`_ is a query language for JSON that can be used