Query with S3 Select

Describes how to query objects.

You can query CSV, JSON, and Apache Parquet files.

Usage Notes

Review the following notes related to the use of S3 Select before you run any queries.
Parquet files
Before you run any queries against Parquet files, set export MINIO_API_SELECT_PARQUET=on in the /opt/mapr/conf/env.sh file and restart the Object Store server. You can restart the Object Store server from the Services page in the Control System or from the CLI by running the following command:
/opt/mapr/bin/maprcli node services -nodes <space-delimited list of node names> -s3server restart
JSON documents
When you query a JSON document, you must include the --json-input parameter and type=document, as shown in the following example:
/opt/mapr/bin/mc sql --json-input type=document --query "select * from S3Object" alias0/mybucket/example5.json

Using the CLI

Use the mc sql command to query objects.

Using the Object Store Interface

  1. Login to the Object Store Interface.
  2. Click the bucket icon from the left pane.
  3. From the Buckets page, click the bucket in which the object exists.
  4. Navigate to the Objects tab.
  5. View the list of objects.
  6. Scroll through the list of objects, or enter a name in the search field to search for the object.
  7. Select Query with S3 Select from the Actions menu of the object to query.
  8. Select the characteristics of the object such as the format, the number of lines that the object spans, the CSV delimiter for the fields and the compression type if any for the object.
  9. Select the output type either CSV or JSON and the CSV delimiter to use.
  10. Enter the query to run. The default query is SELECT * FROM s3object s LIMIT 5.
  11. Click Run SQL Query.