Big data file shares are registered as a data store through ArcGIS Server Manager on your ArcGIS GeoAnalytics Server. A big data file share requires a manifest to outline the schema of the input data, as well as the fields and formats that represent geometry and time in a dataset. The manifest is automatically generated when you register a big data file share. You may need to make modifications if there are any changes to your data, or if the manifest generation was unable to determine all the information needed — for example, if the automatically-generated manifest did not select the correct field for the geometry or time. A big data file share may optionally have output templates, used to outline the format of results written to the big data file share. The output templates are generated when you register a big data file share, and select to use the big data file share as an output location. You may need to modify one or more templates, such as the format of the time and geometry fields, or you may want to add or delete a template.
You can view and edit the datasets and manifest information, as well as the output templates through ArcGIS Server Manager on your ArcGIS GeoAnalytics Server.
Edit a big data file share
Once you have registered a big data file share, you can view and edit attributes and settings for that item's registered datasets by opening the big data file share manifest editor. You can also edit attributes and settings for the optional output templates, which outline how output results will be written to the big data file share.
For example, for input data, you may want to verify the number of datasets within a registered file share. If, in doing so, you do not see the expected number of datasets in the registered file share, you should check whether the registered location contains valid datasets.
For an output template, you may want to format a delimited file output to write a tab-delimited file and use WKT to store the geometry.
You may also want to review dataset schemas for a registered big data file share. You can modify a selected dataset's schema by updating its geometry, time definition, and field names in its associated manifest resource.
On the advanced tab of the big data file share manifest editor, you can upload a hints file to provide information about a dataset, such as the presence or absence of a header row, encoding, field delimiter, or record terminator. Regenerating the manifest after uploading a hints file will use the information provided to generate the manifest.
Optionally, you can download the manifest, edit it, and upload the edited file.
Edit big data file share input datasets
In the big data file share manifest editor, you can view a selected big data file share and the datasets that have been successfully registered within it. When selecting a dataset from the editor drop-down menu, the corresponding parameters are populated. For details about each option on this dialog box, see editing parameters in big data file shares. To edit dataset parameters, do the following:
- On the Registered Data Stores dialog box, locate the big data file share you want to edit.
- Click the Edit pencil to see details and options for corresponding datasets.
- Click the Datasets tab to show the registered datasets and their corresponding parameters.
- Select a dataset from the drop-down menu to view the information represented in its manifest. Make updates to your dataset properties as needed.
- When you have finished editing dataset properties, click Save.
Edit a big data file share manifest or hints file
On the Advanced tab of the big data file share editor, you can edit the associated manifest or hints file by choosing its respective tab. If you upload a manifest, it will overwrite any changes you have made to your big data file share manifest in the editor, and replace the current manifest. To learn more about the big data file share manifest, see Understanding a big data file share manifest. To learn more about using a hints file, see Understanding the hints file. To edit a big data file share manifest or hints file, do the following:
- On the Registered Data Stores dialog box, locate the big data file share you want to modify.
- Click the Edit pencil to see options for modifying the manifest resource.
- Click the Advanced tab.
- From the Advanced tab, choose the Manifest or Hints tab, depending on which you are modifying.
- To download the manifest file, click Manifest > Download.
- To download the hints file, click Hints > Download.
- Use a text editor to modify and save changes locally to the downloaded.json manifest file or .dat hints file.
Tip:
The default file format for the hints file is .dat. Once you've downloaded the file, you can change its extension to .txt and edit the file. - To upload an edited file, click the Edit pencil for the big data file share you want to modify.
- To edit the manifest, click Advanced > Manifest > Upload and browse to the updated .json file.
- To edit the hints file, click Advanced > Hints > Upload and browse to the updated .txt file.
- Click Upload.
If you upload a hints file, be sure to regenerate the manifest. When you regenerate a manifest, only datasets with hints or new datasets will be updated, and changes made to any other datasets not in the hints file will remain the same.
Regenerate the manifest for a big data file share
After a big data file share is created and a manifest has been generated, a regenerate manifest button appears for each entry on the Registered Data Stores dialog box.
You can regenerate a manifest if you have added new data or if you have uploaded a hints file using the edit resource. The hints file provides specifications that are used when regenerating the manifest.
Note:
When a manifest is regenerated, it will update the manifest for existing datasets that have a hints file or new datasets. Any edits you have made to the manifest will be overwritten with the rules defined in the hints file.Big data file share editing parameters
The big data file share editor comprises the following five sections:
- Dataset selector
- Fields
- Geometry
- Time
- Dataset format
It is recommended to use a hints file before editing your data if manifest generation did not correctly determine field names, encoding, field delimiters, or quote characters.
Dataset selector
A manifest is composed of one or more datasets. The number of datasets is dependent on the number of folders in your big data file share location. When you open the manifest manager, you can view the datasets that have been successfully registered in your big data file share. When you select a dataset from the drop-down menu, the dataset parameters will be populated with the dataset information.
If you expected to find more datasets in your manifest or are missing any, do the following:
- Verify that you correctly registered the top-level folder. For more information, see Register your data with ArcGIS Server Manager.
- Check that your input data is in an allowable format, such as a collection of delimited files, shapefiles, parquet, or ORC.
- Ensure that the schema of your input dataset of interest is consistent for a collection of files (all files in a single dataset must have the same fields).
Fields
The fields section lists all of the fields in a dataset. When you select a dataset, you will be able to see the following for each field:
- The name of the field
- The field type
The field name and type can be modified for delimited files. If you are modifying more than one field name, it is recommended to use a hints file.
If the input dataset is a delimited file, there will be multiple parameters that can be modified in the manifest in ArcGIS Server Manager.
Geometry
The geometry section lists the type of geometry, and how it is represented. The following table outlines the available options, with notes for changes you can make, depending on the input dataset type:
Geometry parameters
Parameter | Description | Delimited files | Shapefiles | ORC files | Parquet files |
---|---|---|---|---|---|
Geometry | The Geometry type. Options are Point, Polyline, Polygon, or None. If there is no geometry, the input is a table. | Editable | Cannot be modified | Editable | Editable |
Spatial reference (WKID/WKT) | The spatial reference of the dataset. This option is only shown if the dataset is not a table. | This can be modified. By default, it will be set to 4326, WGS 1984. | Cannot be modified | Editable | Editable |
Geometry formatting type | How the geometry is formatted for each feature. Options are XYZ (fields that represent X, Y, and optionally Z values—XYZ is only applicable to points), WKT (well known text), GeoJson, EsriJson, and shape. This option is only available if the dataset is not a table and not a shapefile. | Editable | Not available | Editable | Editable |
Time
The time section outlines how time is represented. The following table outlines the available options, with notes for changes you can make, depending on the input dataset type. Time options are the same for all data types, except where noted.
Time parameters
Parameter | Description | Example |
---|---|---|
Time type | The type of the input time. Options are Instant (a single moment in time), Interval (a span of time with a start and end time), and None. | Instant |
Time zone | The time zone of the input time. This option is only available if Time Type is not None. | UTC |
Name and formatting table for time | This table selects the time field or fields, and outlines how time is defined. Time can use one or more fields to define time, as well as use one or more formats for a single field. By default, the first field with the name "time" will be used as the time field, with an estimate of the time format. If there is a shapefile, the first field of type "date" will be used. If time is of type Interval, there must be a start and end time specified. The time formatting table is only available if Time Type is not None. | Example with a single field used to represent time with two different formats:
Example with two fields used to represent time :
|
Time formats
The following table outlines how to represent time when you edit a big data file share through ArcGIS Server Manager or directly in a manifest. The examples show how to represent the time January 2, 2016, at 9:45:02.05 PM.
Time formats in big data file shares
Symbol | Meaning | Example |
---|---|---|
yy | The year, represented by two digits. | 16 |
yyyy | The year, represented by four digits. | 2016 |
MM | The month, represented numerically. | 01 or 1 |
MMM | The month, represented using three letters. | Jan |
MMMM | The month, represented using the complete spelling. | January |
dd | The day. | 02 or 2 |
HH | The hour when using a 24-hour day; values range from 0-23. | 21 |
hh | The hour when using a 12-hour day; values range from 1-12. | 9 |
mm | The minute; values range from 0-59. | 45 |
ss | The second; values range from 0-59. | 02 |
SSS | The millisecond; values range from 0-999. | 50 |
a | The AM/PM marker. | PM |
epoch_millis | The time in milliseconds from epoch. | 1509581781000 |
epoch_seconds | The time in seconds from epoch. | 1509747601 |
Z | The time zone offset expressed in hours. | -0100 or -01:00 |
ZZZ | The time zone offset expressed using IDs. | America/Los_Angeles |
'' | Use single quotes to add text that doesn't represent a value outlined in this table. | 'T' |
The following table shows examples for different formats of the same date, January 2, 2016, at 9:45:02.05 PM:
Time format examples
Input date | Date format |
---|---|
01/02/2016 9:45:02PM | MM/dd/yyyy hh:mm:ssa |
Jan02-16 21:45:02 | MMMdd-yy HH:mm:ss |
January 02 2016 9:45:02.050PM | MMMM dd yyyy hh:mm:ss.SSSa |
01/02/2017T9:45:14:05-0000 | MM/dd/yyyy'T'HH:mm:ssZ |
Dataset format
The dataset format section outlines the format the data is in. Data may be in one of the following formats:
- Shapefile (.shp)
- Delimited file (for example .csv)
- Parquet file
- ORC file
The available parameters differ, depending on the dataset. For shapefiles, ORC and parquet files, the only parameter is the file type, which cannot be modified. If the input dataset is a delimited file, there will be multiple parameters that can be modified. To modify values for a delimited file, use a hints file and regenerate the manifest.. These are outlined in the following table:
Dataset formats
Parameter | Description |
---|---|
File extension | Lists the file type extension on the input dataset. Common formats are .csv and .txt. Modify this information for a delimited file using a hints file. |
Field delimiter | Determines the delimiter for each field. Common formats are , and ;. Modify this information for a delimited file using a hints file. |
Record terminator | Determines the terminator for each row of data. Common formats are \n and \t. Modify this information for a delimited file using a hints file. |
Quote character | Determines the character used for quotes. Modify this information for a delimited file using a hints file. |
Has header row | A Boolean that determines if the input table included a header row. If a header row is included, the headers will be used for the field names. Field name information is predicting geometry and time fields. Set header rows using the hints file. |
Encoding | The type of encoding used on the file. By default, this will be UTF-8. This is set with a hints file. |
Big data file share output template editing parameters
The big data file share output template editor comprises the following four sections:
- Output template selector
- Geometry formatting
- Time formatting
- Dataset format
Dataset selector
A big data file share is optionally composed of one or more templates. The number of templates is determined by the different formats to which you want to write results. When you open the output template manager, you can view the templates that have been successfully registered in your big data file share. When you select a template from the drop-down menu, the template parameters will be populated with the output formatting information. If you want to add a new template, select the Add template option, and select the type and name of the new template. If you want to delete a template, select it from the template selector, and select Delete template. You can modify an existing template by selecting it, and modifying any of the sections below as needed.
Note:
The input big data file shares have a fields section. The output templates do not have a fields section, since the resulting fields are determined by the GeoAnalytics Tools creating the result.
Geometry
The geometry section lists how you would like the output geometry to be formatted of each geometry type (point, line, polygon). There are two parts to determining the output geometry:
- The spatial reference—You can leave it empty, and it will use the tool results (default). Optionally provide a WKID or WKT string, and all results will be projected to that spatial reference. This value is shared across all output geometries.
- The geometry formatting type and fields. This is described in more detail below.
Output geometry formats
Geometry type | Output Fields | Delimited files | Shapefiles | ORC files | Parquet files |
---|---|---|---|---|---|
XYZ—An X, Y, and optionally Z field. This option is only available for points. | By default, three new fields will be created named X, Y, and Z. You can optionally change these field names. | ||||
WKT | By default, one new field named Geometry will be created. You can optionally change the output field names. | ||||
GeoJSON | By default, one new field named Geometry will be created. You can optionally change the output field names. | ||||
EsriJSON | By default, one new field named Geometry will be created. You can optionally change the output field names. | ||||
SHP | By default, one new field named Geometry will be created. You can optionally change the output field names. | ||||
WKB | By default, one new field named Geometry will be created. You can optionally change the output field names. | ||||
Shape Buffer | By default,one new field named Geometry will be created. You can optionally change the output field names. |
Time
The time section outlines how output time is represented. Formatting time requires the following information:
- Formatting for both instants and intervals.
- The field names to which time will be written.
- The format (String or Date) that time will be written as. Note that delimited files can only be formatted with string.
- For intervals, which fields represent the start and end time.
Time formatting is the same as for input big data files. See Time formats in a big data file share.
Dataset format
The dataset format section outlines the output format to which the data will be written. Data may be in one of the following formats:
- Shapefile (.shp)
- Delimited file (for example .csv)
- Parquet file
- ORC file
The available parameters differ, depending on the dataset. For shapefiles, ORC, and parquet files, the only parameter is the file type, which cannot be modified. If the input dataset is a delimited file, there will be multiple parameters that can be modified in ArcGIS Server Manager. These are outlined in the following table:
Dataset formats
Parameter | Description |
---|---|
File extension | Extensions are never applied to an output dataset. |
Field delimiter | Determines the delimiter for each field. Common formats are , and ;. |
Record terminator | The terminator for each row of data cannot be set. For Windows, the terminator is \r\n, for Linux, it's \n . |
Quote character | Determines the character used for quotes. |
Has header row | A Boolean that determines if the output table will include a header row representing the field names. By default, this is true. |
Encoding | This will always be UTF-8. |