# Supported Dataset Formats

Earthscale supports a wide range of geospatial data formats for both raster and vector datasets. This page provides a complete reference of supported formats, upload methods, and any constraints.

## Raster Formats

Raster datasets represent gridded data such as satellite imagery, elevation models, and climate data.

| Format    | Extensions                                     | Notes                                                                                                                  |
| --------- | ---------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- |
| GeoTIFF   | `.tif`, `.tiff`, `.gtif`, `.gtiff`, `.geotiff` | Standard georeferenced TIFF. Cloud Optimized GeoTIFFs (COGs) are fully supported and recommended for best performance. |
| JPEG 2000 | `.jp2`                                         | Wavelet-based compression format commonly used for satellite imagery.                                                  |
| Zarr      | `.zarr` (directory)                            | Chunked, compressed array format ideal for large multidimensional datasets. Supports Zarr v2 and v3.                   |

### Cloud Optimized GeoTIFFs (COGs)

We recommend using [Cloud Optimized GeoTIFFs](https://www.cogeo.org/) when possible. COGs include internal tiling and overviews that enable efficient streaming directly from cloud storage, resulting in faster visualization and reduced data transfer.

### Multi-File Raster Datasets

Earthscale natively supports multi-file raster datasets including:

* **Raster mosaics**: Collections of GeoTIFFs covering different spatial tiles or UTM zones
* **Time series**: Files representing different timestamps with dates encoded in filenames
* **Multi-band collections**: Separate files for each spectral band

Use wildcard patterns like `gs://bucket/path/*.tif` to register these collections as a single dataset. See [Raster Mosaic and Time Series Support](/earthscale-documentation/advanced-dataset-configuration.md) for configuration options.

## Vector Formats

Vector datasets represent discrete geographic features such as points, lines, and polygons.

| Format     | Extensions                                      | Notes                                                                                                                |
| ---------- | ----------------------------------------------- | -------------------------------------------------------------------------------------------------------------------- |
| GeoJSON    | `.geojson`, `.json`                             | Standard JSON-based format for vector features. Also supports newline-delimited variants (`.geojsonl`, `.geojsons`). |
| FlatGeobuf | `.fgb`                                          | Binary format optimized for streaming and cloud storage. Recommended for large vector datasets.                      |
| GeoParquet | `.parquet`, `.geoparquet`                       | Column-oriented format with excellent compression. Ideal for large datasets with many attributes.                    |
| Shapefile  | `.shp` + `.shx` + `.dbf` (folder) or `.shp.zip` | Legacy format requiring multiple sidecar files. Upload as a folder or zipped archive.                                |
| KML/KMZ    | `.kml`, `.kmz`                                  | Google Earth format. KMZ files are compressed KML archives.                                                          |
| ZIP        | `.zip`                                          | Zipped shapefile archives are automatically extracted.                                                               |

### Shapefile Requirements

Shapefiles consist of multiple files that must be uploaded together:

* `.shp` - Feature geometry (required)
* `.shx` - Shape index (required)
* `.dbf` - Attribute data (required)
* `.prj` - Projection definition (recommended)
* `.cpg` - Character encoding (optional)

Upload shapefiles either as:

1. A **folder** containing all component files
2. A **ZIP archive** with all files at the root level

## Zarr Stores

Zarr is a format for chunked, compressed N-dimensional arrays. Earthscale automatically detects Zarr stores by looking for metadata files:

* `zarr.json` (Zarr v3)
* `.zmetadata` (consolidated metadata)
* `.zgroup` or `.zarray` (Zarr v2)

Zarr stores can include multiple variables with dimensions like time, x, y, and additional coordinates. Earthscale will automatically infer spatial dimensions and coordinate reference systems.

## Upload Limits

Different limits apply depending on your upload method:

### Local File Upload

| Constraint                   | Limit       |
| ---------------------------- | ----------- |
| Maximum files per upload     | 1,000 files |
| Maximum size per upload      | 10 GB       |
| Maximum individual file size | 10 GB       |

For larger datasets, use cloud storage and provide the URL directly.

### Cloud Storage

| Constraint                        | Limit        |
| --------------------------------- | ------------ |
| Maximum files for raster mosaics  | 15,000 files |
| Maximum files for vector datasets | 100 files    |
| Maximum vector dataset input size | 8 GB         |

Need to work with larger datasets? [Contact us](mailto:support@earthscale.ai) for enterprise options.

## Cloud Storage Sources

Connect datasets directly from your cloud storage:

| Provider             | URL Format                                               | Example                                                  |
| -------------------- | -------------------------------------------------------- | -------------------------------------------------------- |
| Google Cloud Storage | `gs://bucket/path`                                       | `gs://my-bucket/data/*.tif`                              |
| AWS S3               | `s3://bucket/path`                                       | `s3://my-bucket/imagery/scene.tif`                       |
| Azure Blob Storage   | `https://{account}.blob.core.windows.net/container/path` | `https://myaccount.blob.core.windows.net/data/image.tif` |
| Google Drive         | Shareable link                                           | Single files only                                        |
| HTTP/HTTPS           | `https://example.com/path`                               | Public URLs                                              |

Wildcard patterns (`*`) are supported for registering multiple files as a single dataset.

## Unsupported Formats

The following formats are not yet supported:

| Format                    | Alternative                                                         |
| ------------------------- | ------------------------------------------------------------------- |
| NetCDF (`.nc`, `.nc4`)    | Convert to Zarr using `xarray` or to GeoTIFF using `gdal_translate` |
| HDF4/HDF5 (`.hdf`, `.h5`) | Convert to Zarr or GeoTIFF                                          |
| PMTiles (`.pmtiles`)      | Use the source vector data format instead                           |
| GRIB (`.grib`, `.grib2`)  | Convert to Zarr or GeoTIFF                                          |

### Converting NetCDF to Zarr

```python
import xarray as xr

ds = xr.open_dataset("input.nc")
ds.to_zarr("output.zarr")
```

### Converting to Cloud Optimized GeoTIFF

This is not required, but may improve rendering speed.

```bash
gdal_translate input.tif output.tif -of COG -co COMPRESS=ZSTD
```

## Next Steps

* [Access Your Own Data](/earthscale-documentation/access-your-own-data.md) - Connect your cloud storage
* [Raster Mosaic and Time Series Support](/earthscale-documentation/advanced-dataset-configuration.md) - Configure multi-file datasets
* [Local File Upload](/earthscale-documentation/access-your-own-data/local-file-upload.md) - Upload from your computer


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.earthscale.ai/earthscale-documentation/supported-formats.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
