GetItems

On This Page

Description

Retrieves (reads) attributes of multiple items in a table or in a data container's root directory, according to the specified criteria.

Scan Optimization

GetItems enables you to optimize table scans by using either of the following methods:

Note
  • You can't use both optimization methods together in the same GetItems request.

  • If you're looking for a specific item, use GetItem, which is faster than either of these GetItems optimized-scan methods because it searches for a specific object file on the relevant data slice. See Working with NoSQL Data.

Range Scan

GetItems allows you to perform a range scan to retrieve items with a specific sharding-key value by setting the ShardingKey request parameter to the requested sharding-key value. You can also optionally restrict the query to a specific range of item sorting-key values by using the SortKeyRangeStart and/or SortKeyRangeEnd parameters. A range scan is more efficient than the default GetItems full table scan because of the way that the data is stored and accessed. For more information, see Working with NoSQL Data.

Parallel Scan (Segmented Table Scan)

GetItems scans table items in search for the requested items. By default, the scan is executed sequentially. However, you can optionally scan only a specific portion (segment) of the table: you can set the request's TotalSegment parameter to the number of segments into which you wish to divide the table, and set the request's Segment parameter to the ID of the segment that you wish to scan in the current operation. To improve performance, you can implement a parallel table scan by dividing the scan among multiple application instances ("workers"), assigning each worker a different segment to scan. Note that such an implementation requires that the workers all send GetItems requests with the same scan criteria and total-segments count but with different scan segments.

The following table depicts a parallel multi-worker scan of a segmented table with GetItems:

Diagarm of a GetItems parallel scan of a segmented table

Partial Response

The GetItems response might not return all the requested items, especially if the overall size of the requested data is considerable. In such cases, the value of the LastItemIncluded response element is "FALSE". To retrieve the remaining requested items, send a new identical GetItems request, and set its Marker parameter to the value of the NextMarker element that was returned in the response of the previous request. When the Marker request parameter is set, the operation begins searching for matching items at the specified marker location.

Note
  • The Limit request parameter defines the maximum number of items to return in the response object for the current API call. When issuing a GetItems request with a new marker, after receiving a partial response, consider recalculating the limit to subtract the items returned in the responses to the previous requests.

  • A GetItems response might contain less items than specified in the Limit request parameter even if there are additional table items that match the request (i.e., the value of the LastItemIncluded response element is "FALSE"). In such cases, you need to issue a new GetItems request to retrieve the remaining items, as explained above.

  • Requests that set the Marker parameter must perform a similar scan to that performed by the previous partial-response request — be it a parallel scan, a range scan, or a regular scan. For example, you cannot use the NextMarker response element returned for a previous range-scan request as the value of the Marker parameter of a parallel-scan request.

Request

Request Header

Syntax
    POST /<container>/<resource> HTTP/1.1
    Host: <web-APIs URL>
    Content-Type: application/json
    X-v3io-function: GetItems
    X-v3io-session-key: <access key>
    
    url = "http://<web-APIs URL>/<container>/<resource>"
    headers = {
                "Content-Type": "application/json",
                "X-v3io-function": "GetItems",
                "<Authorization OR X-v3io-session-key>": "<value>"
              }
    
    URL Resource Parameters
    • To retrieve items from a specific table, set the relative table path within the configured container in the request URL or in the TableName JSON parameter, or split the path between the URL and the JSON parameter. See Data-Service Web-API General Structure.
    • To retrieve items from the root directory of the configured container, omit the <resource> URL element — i.e., end the URL in the request header with <container>/ — and either don't set the request's TableName JSON parameter or set it to "/".

    Request Data

    Syntax
      {
          "TableName":          "string",
          "Limit":              number,
          "AttributesToGet":    "string",
          "FilterExpression":   "string",
          "ShardingKey":        "string",
          "SortKeyRangeStart":  "string",
          "SortKeyRangeEnd":    "string",
          "Segment":            number,
          "TotalSegment":       number,
          "Marker":             "string"
      }
      
      payload = {
                  "TableName":          "string",
                  "Limit":              number,
                  "AttributesToGet":    "string",
                  "FilterExpression":   "string",
                  "ShardingKey":        "string",
                  "SortKeyRangeStart":  "string",
                  "SortKeyRangeEnd":    "string",
                  "Segment":            number,
                  "TotalSegment":       number,
                  "Marker":             "string"
                }
      
      Parameters
      TableName

      To retrieve items from a specific table (collection), set the relative table path within the configured container in this parameter or in the request URL, or split the path between the URL and the JSON parameter. See Data-Service Web-API General Structure.

      To retrieve items from the root directory of the configured container, end the URL in the request header with <container>/ and either don't set the TableName JSON parameter or set it to "/".

      • Type: String
      • Requirement: Optional
      Limit

      The maximum number of items to return within the response (i.e., the maximum number of elements in the response object's Items array).

      • Type: Number
      • Requirement: Optional
      AttributesToGet

      The attributes to return for each item.

      • Type: String
      • Requirement: Optional
      • Default Value: "*"

      The attributes to return can be depicted in one of the following ways:

      • A comma-separated list of attribute names.
        Note: Currently, the delimiter commas cannot be surrounded by spaces.

        The attributes can be of any attribute type — user, system, or hidden.

      • "*" — retrieve the item's user attributes and __name system attribute, but not other system attributes or hidden attributes. This is the default value.

      • "**" — retrieve all item attributes — user, system, and hidden attributes.

      For an overview of the different attribute types, see Attribute Types.

      FilterExpression

      A filter expression that restricts the items to retrieve. Only items that match the filter criteria are returned. See filter expression.

      • Type: String
      • Requirement: Optional
      ShardingKey

      The sharding-key value of the items to get by using a range scan. The sharding-key value is the part to the left of the leftmost period in a compound primary-key value (item name). You can optionally use the SortKeyRangeStart and/or SortKeyRangeEnd request parameters to restrict the search to a specific range of sorting keys (SortKeyRangeStart >= <sorting key> < SortKeyRangeEnd).

      Note

      To retrieve all items for an original sharding-key value that was recalculated during the ingestion (to achieve a more even workload distribution), you need to repeat the GetItems request for each of the sharding-key values that were used in the ingestion. If the ingestion was done by using the even-distribution option of the NoSQL Spark DataFrame, you need to repeat the request with ShardingKey values that range from <original sharding key>_1 to <original sharding key>_<n>, where <n> is the value of the v3io.kv.range-scan.hashing-bucket-num configuration property (default = 64); for example, johnd_1 .. johnd_64. For more information, see Recalculating Sharding-Key Values for Even Workload Distribution.

      • Type: String
      SortKeyRangeStart

      The minimal sorting-key value of the items to get by using a range scan. The sorting-key value is the part to the right of the leftmost period in a compound primary-key value (item name). This parameter is applicable only together with the ShardingKey request parameter. The scan will return all items with the specified sharding-key value whose sorting-key values are greater than or equal to (>=) the value of the SortKeyRangeStart parameter and less than (<) the value of the SortKeyRangeEnd parameter (if set).

      • Type: String
      • Requirement: Optional
      SortKeyRangeEnd

      The maximal sorting-key value of the items to get by using a range scan. The sorting-key value is the part to the right of the leftmost period in a compound primary-key value (item name). This parameter is applicable only together with the ShardingKey request parameter. The scan will return all items with the specified sharding-key value whose sorting-key values are greater than or equal to (>=) than the value of the SortKeyRangeStart parameter (if set) and less than (<) the value of the SortKeyRangeEnd parameter.

      • Type: String
      • Requirement: Optional
      Segment

      The ID of a specific table segment to scan — 0 to one less than TotalSegment. See Parallel Scan.

      • Type: Number
      TotalSegment

      The number of segments into which to divide the table scan — 1 to 1024. See Parallel Scan. The segments are assigned sequential IDs starting with 0.

      • Type: Number
      • Requirement: Required when Segment is provided

      Marker

      An opaque identifier that was returned in the NextMarker element of a response to a previous GetItems request that did not return all the requested items. This marker identifies the location in the table from which to start searching for the remaining requested items. See Partial Response and the description of the NextMarker response element.

      • Type: String
      • Requirement: Optional

      Response

      Response Data

      Syntax
      {
          "LastItemIncluded":   "string",
          "NumItems":           number,
          "NextMarker":         "string",
          "Items": [
              {
                  "string": {
                      "S":      "string",
                      "N":      "string",
                      "BOOL":   Boolean,
                      "B":      "blob"
                  }
              }
          ]
      }
      
      Elements
      LastItemIncluded

      "TRUE" if the scan completed successfully — the entire table was scanned for the requested items and all relevant items were returned (possibly in a previous response — see Partial Response); "FALSE" otherwise.

      • Type: Boolean string — "TRUE" or "FALSE"
      NumItems

      The number of items in the response's Items array.

      • Type: Number
      NextMarker

      An opaque identifier that marks the location in the table at which to start searching for remaining items in the next call to GetItems. See Partial Response and the description of the Marker request parameter. When the response contains all the requested items, NextMarker is not returned.

      • Type: String
      Items

      An array of items containing the requested attributes. The array contains information only for items that satisfy the conditions of the FilterExpression request parameter. Each returned item object includes only the attributes requested in the AttributesToGet parameter, provided the item has these attributes.

      • Type: An array of item JSON objects that contain Attribute objects

      Examples

      Example 1 — Basic Filter-Expression Scan

      Retrieve from a "MyDirectory/Cars" table in a "mycontainer" container the __name, km, state, and manufacturer attributes (if exist) of up to 1,000 items whose km attribute value is greater than or equal to 10,000, and whose lastService attribute value is less than 10,000:

      Request
        POST /mycontainer/MyDirectory/ HTTP/1.1
        Host: https://default-tenant.app.mycluster.iguazio.com:8443
        Content-Type: application/json
        X-v3io-function: GetItems
        X-v3io-session-key: e8bd4ca2-537b-4175-bf01-8c74963e90bf
        
        {
            "TableName":        "Cars",
            "Limit":            1000,
            "AttributesToGet":  "__name,km,state,manufacturer",
            "FilterExpression": "(km >= 10000) AND (lastService < 10000)"
        }
        
        import requests
        
        url = "https://default-tenant.app.mycluster.iguazio.com:8443/mycontainer/MyDirectory/"
        headers = {
                    "Content-Type": "application/json",
                    "X-v3io-function": "GetItems",
                    "X-v3io-session-key": "e8bd4ca2-537b-4175-bf01-8c74963e90bf"
                  }
        payload = {
                    "TableName":        "Cars",
                    "Limit":            1000,
                    "AttributesToGet":  "__name,km,state,manufacturer",
                    "FilterExpression": "(km >= 10000) AND (lastService < 10000)"
                  }
        
        response = requests.post(url, json=payload, headers=headers)
        print(response.text)
        
        
        Response
        HTTP/1.1 200 OK
        Content-Type: application/json
        ...
        
        {
            "LastItemIncluded": "TRUE",
            "NumItems": 3,
            "Items": [
                {
                    "__name": {"S": "7348841"},
                    "km": {"N": "10000"},
                    "state": {"S": "OK"}
                },
                {
                    "__name": {"S": "6924123"},
                    "km": {"N": "15037"},
                    "state": {"N": "OUT_OF_SERVICE"},
                    "manufacturer": {"S": "Honda"}
                },
                {
                    "__name": {"S": "7222751"},
                    "km": {"N": "12503"}
                },
                {
                    "__name": {"S": "5119003"},
                    "km": {"N": "11200"},
                    "manufacturer": {"S": "Toyota"}
                }
            ]
        }
        

        Example 2 — Range Scan

        This examples demonstrates two range-scan queries for a "mytaxis/rides" table in a "mycontainer" container. The table contains the following items:

        +---------+--------+---------+--------+----------------+------------------+-------------------+
        |driver_id|    date|num_rides|total_km|total_passengers|       avg_ride_km|avg_ride_passengers|
        +---------+--------+---------+--------+----------------+------------------+-------------------+
        |        1|20180601|       25|   125.0|              40|               5.0|                1.6|
        |        1|20180602|       20|   106.0|              46|               5.3|                2.3|
        |        1|20180701|       28|   106.4|              42|3.8000000000000003|                1.5|
        |       16|20180601|        1|   224.2|               8|             224.2|                8.0|
        |       16|20180602|       10|   244.0|              45|              24.4|                4.5|
        |       16|20180701|        6|   193.2|              24|32.199999999999996|                4.0|
        |       24|20180601|        8|   332.0|              18|              41.5|               2.25|
        |       24|20180602|        5|   260.0|              11|              52.0|                2.2|
        |       24|20180701|        7|   352.1|              21|50.300000000000004|                3.0|
        +---------+--------+---------+--------+----------------+------------------+-------------------+
        
        Request

        The first query scans for all attributes of the items whose sharding-key value is 1:

          POST /mycontainer/mytaxis/rides/ HTTP/1.1
          Host: https://default-tenant.app.mycluster.iguazio.com:8443
          Content-Type: application/json
          X-v3io-function: GetItems
          X-v3io-session-key: e8bd4ca2-537b-4175-bf01-8c74963e90bf
          
          {
              "ShardingKey":      "1",
              "AttributesToGet":  "*"
          }
          
          import requests
          
          url = "https://default-tenant.app.mycluster.iguazio.com:8443/mycontainer/mytaxis/rides/"
          headers = {
                      "Content-Type": "application/json",
                      "X-v3io-function": "GetItems",
                      "X-v3io-session-key": "e8bd4ca2-537b-4175-bf01-8c74963e90bf"
                    }
          payload = {
                      "ShardingKey":      "1",
                      "AttributesToGet":  "*"
                    }
          
          response = requests.post(url, json=payload, headers=headers)
          print(response.text)
          
          

          The second query scans for the driver_id, date, avg_ride_km, and avg_ride_passengers attributes of all items whose sharding-key value is 24 and whose sorting-key values are within the first six months of 2018:

            POST /mycontainer/mytaxis/rides/ HTTP/1.1
            Host: https://default-tenant.app.mycluster.iguazio.com:8443
            Content-Type: application/json
            X-v3io-function: GetItems
            X-v3io-session-key: e8bd4ca2-537b-4175-bf01-8c74963e90bf
            
            {
                "ShardingKey":        "24",
                "SortKeyRangeStart":  "20180101",
                "SortKeyRangeEnd":    "20180701",
                "AttributesToGet":    "__name,driver_id,date,avg_ride_km,avg_ride_passengers"
            }
            
            import requests
            
            url = "https://default-tenant.app.mycluster.iguazio.com:8443/mycontainer/mytaxis/rides/"
            headers = {
                        "Content-Type": "application/json",
                        "X-v3io-function": "GetItems",
                        "X-v3io-session-key": "e8bd4ca2-537b-4175-bf01-8c74963e90bf"
                      }
            payload = {
                        "ShardingKey":        "24",
                        "SortKeyRangeStart":  "20180101",
                        "SortKeyRangeEnd":    "20180701",
                        "AttributesToGet":    "__name,driver_id,date,avg_ride_km,avg_ride_passengers"
                      }
            
            response = requests.post(url, json=payload, headers=headers)
            print(response.text)
            
            
            Response

            Response to the first query —

            HTTP/1.1 200 OK
            Content-Type: application/json
            ...
            
            {
                "LastItemIncluded": "TRUE",
                "NumItems": 3,
                "Items": [
                    {
                        "__name": {
                            "S": "1.20180601"
                        },
                        "avg_ride_km": {
                            "N": "5"
                        },
                        "total_passengers": {
                            "N": "40"
                        },
                        "driver_id": {
                            "N": "1"
                        },
                        "avg_ride_passengers": {
                            "N": "1.6"
                        },
                        "total_km": {
                            "N": "125"
                        },
                        "date": {
                            "S": "20180601"
                        },
                        "num_rides": {
                            "N": "25"
                        }
                    },
                    {
                        "__name": {
                            "S": "1.20180602"
                        },
                        "avg_ride_km": {
                            "N": "5.3"
                        },
                        "total_passengers": {
                            "N": "46"
                        },
                        "driver_id": {
                            "N": "1"
                        },
                        "avg_ride_passengers": {
                            "N": "2.3"
                        },
                        "total_km": {
                            "N": "106"
                        },
                        "date": {
                            "S": "20180602"
                        },
                        "num_rides": {
                            "N": "20"
                        }
                    },
                    {
                        "__name": {
                            "S": "1.20180701"
                        },
                        "avg_ride_km": {
                            "N": "3.8"
                        },
                        "total_passengers": {
                            "N": "42"
                        },
                        "driver_id": {
                            "N": "1"
                        },
                        "avg_ride_passengers": {
                            "N": "1.5"
                        },
                        "total_km": {
                            "N": "106.4"
                        },
                        "date": {
                            "S": "20180701"
                        },
                        "num_rides": {
                            "N": "28"
                        }
                    }
                ]
            }
            

            Response to the second query —

            HTTP/1.1 200 OK
            Content-Type: application/json
            ...
            
            {
                "LastItemIncluded": "TRUE",
                "NumItems": 2,
                "Items": [
                    {
                        "__name": {
                            "S": "24.20180601"
                        },
                        "driver_id": {
                            "N": "24"
                        },
                        "date": {
                            "S": "20180601"
                        },
                        "avg_ride_km": {
                            "N": "41.5"
                        },
                        "avg_ride_passengers": {
                            "N": "2.25"
                        }
                    },
                    {
                        "__name": {
                            "S": "24.20180602"
                        },
                        "driver_id": {
                            "N": "24"
                        },
                        "date": {
                            "S": "20180602"
                        },
                        "avg_ride_km": {
                            "N": "52"
                        },
                        "avg_ride_passengers": {
                            "N": "2.2"
                        }
                    }
                ]
            }