The Describe Dataset tool provides an overview of your big data. By default, the tool outputs a table layer containing summaries of your field values and an overview of your geometry and time settings for the input layer. Optionally, the tool can output a feature layer representing a sample of your input features, or a single polygon feature layer that represents the extent of your input features. You can choose to output one, both, or none.
Workflow diagram
Analysis using GeoAnalytics Tools
Analysis using GeoAnalytics Tools is run using distributed processing across multiple ArcGIS GeoAnalytics Server machines and cores. GeoAnalytics Tools and standard feature analysis tools in ArcGIS Enterprise have different parameters and capabilities. To learn more about these differences, see Feature analysis tool differences.
Examples
- Verify that you correctly registered time and geometry with your big data file share.
- Understand attribute values with summarized field statistics.
- Visualize your big data with a sample layer. Instead of drawing a million features, draw a sample.
- Run workflows using a sample of the data before scaling for longer and larger processing.
- Determine where a dataset is by calculating the geographical extent.
Usage notes
Browse to the tabular, point, line, or area feature layer or big data file share dataset you want to describe using the Choose dataset to describe option.
Output a subset of your data by clicking the Sample layer button and specifying the number of features in the value picker that appears. The output subset will always have the same schema, geometry, and time settings as the input features. Use the subset to understand how your big data appears when added to a map or visualized in an attribute table. Additionally, you can run analysis on the sample dataset to determine the best inputs for larger analysis on your entire dataset.
Output a boundary feature that describes the extent of your input dataset by selecting Extent layer. The output will always be a single rectangle feature representing the geographic extent of the input features. Use the extent layer to understand where your data is located, or use it as input elsewhere in your workflow. For example, use it as the area layer to clip features to using the Clip Layer GeoAnalytics tool.
If Use current map extent is checked, only the features that are within the current map extent will be analyzed. If it's not checked, all input features in the input layer will be analyzed, even if they are outside the current map extent. For example, if you chose to output a sample layer and Use current map extent is not checked, the entire dataset will be used for sample results. If you chose to output an extent layer with Use current map extent checked, the output boundary will represent the map extent.
By default, the tool will output a table containing summary statistics for each field and a JSON describing the properties of the input layer. To access the JSON string, click the Show Result button that appears when you hover over the summary statistics table layer in the table of contents.
The JSON string includes the following information:
- datasetName—The name of the dataset being described.
- datasetSource—The storage location of the input dataset. This value could be ArcGIS Data Store — Relational, ArcGIS Data Store — Spatiotemporal, or Big Data File Share - <your_bdfs_name>.
- recordCount—The total number of records in the input dataset.
- geometry—The geometry settings of your input layer.
- geometryType—The type of geometry your input features represent. This value could be Point, Line, Polygon, or Table.
- sref—The spatial reference your input features use. For example, this value could be {"wkid": 26972}, where 26972 is the spatial reference ID.
- countNonEmpty—The number of features with a valid geometry.
- countEmpty—The number of features without a valid geometry.
- spatialExtent—The geographical extent of your features represented by the minimum and maximum coordinate values.
- time—The time settings of your input layer.
- timeType—The type of time your input features represent. This value could be Instant, Interval, or None.
- countNonEmpty—The number of features with a valid time.
- countEmpty—The number of features without a valid time.
- temporalExtent—The temporal extent of your features represented by the minimum and maximum time values.
Learn more about time settings and big data file share datasets
Learn more about geometry settings and big data file share datasets
Limitations
The sample layer does not represent a truly random geographic selection and should not be used to understand the geographic extent or distribution of your data. For example, if you specify 230 features for Number of features to include, the result can contain 230 input features in any order or location.
How Describe Dataset works
Calculations
Summary statistics are calculated for each field in the input layer. Fields will have different statistics output depending on the field type. The following soil depth example outlines how statistics are calculated for each field type:
Numeric statistic | Calculated result |
---|---|
Count | Count of:
|
Sum |
|
Minimum | Minimum of:
|
Maximum | Maximum of:
|
Mean |
|
Range |
|
Variance |
|
Standard Deviation |
|
Date statistic | Calculated result |
---|---|
Count | Count of:
|
Minimum | Minimum of:
|
Maximum | Maximum of:
|
Range |
|
Note:
Results stored in the ArcGIS Data Store are always stored in milliseconds from epoch Coordinated Universal Time (UTC). For example, the UTC time of 1538713350000 milliseconds is the equivalent to Friday, October 5, 2018 04:22:30 PM in the GMT time zone.
String statistic | Calculated result |
---|---|
Count | ["high", "high", "high", "low", null] = 4 |
Any | = "low" |
Note:
The count statistic (for strings and numeric fields) counts the number of nonempty values. The count of [0, 1, 10, 5, null, 6] = 5. The count of [Primary, Primary, Secondary, null] = 3.
ArcGIS API for Python example
The Describe Dataset tool is available through ArcGIS API for Python.
This example describes a hurricane tracking dataset in a big data file share and outputs a subset of 200 hurricane features and an extent layer.# Import the required ArcGIS API for Python modules
import arcgis
from arcgis import geoanalytics as ga
from arcgis.gis import GIS
# Connect to your ArcGIS Enterprise portal and check that GeoAnalytics is supported
portal = GIS("https://myportal.domain.com/portal", "gis_publisher", "my_password", verify_cert=False)
if not portal.geoanalytics.is_supported():
print("Quitting, GeoAnalytics is not supported")
exit(1)
# Find the big data file share dataset you're interested in using for analysis
search_result = portal.content.search("", "Big Data File Share")
# Look through search results for a big data file share with the matching name
bd_file = next(x for x in search_result if x.title == "bigDataFileShares_NaturalDistasters")
# Look through the big data file share for Hurricanes
hurricanes = next(x for x in bd_file.layers if x.properties.name == "Hurricanes")
# Run the tool Describe Dataset
result = ga.summarize_data.describe_dataset(input_layer=hurricanes, sample_size=200, extent_output=true, output_name="Hurricanes_describe")
# Visualize the sample and extent layers if you are running Python in a Jupyter Notebook
processed_map = portal.map()
processed_map.add_layer(result)
processed_map
Similar tools
Use Describe Dataset when you want to explore your data using samples, statistics, and summarization. Other tools may be useful in solving similar but slightly different problems.
Map Viewer analysis tools
Aggregate your dataset into bins or areas and output summary statistics using the Aggregate Points ArcGIS GeoAnalytics Server tool.
Create a subset of your data within a certain area using the Clip Layer ArcGIS GeoAnalytics Server tool.
ArcGIS Desktop analysis tools
To run this tool from ArcGIS Pro, your active portal must be Enterprise 10.7 or later. You must sign in using an account that has privileges to perform GeoAnalytics Feature Analysis.