# Supported Dataset Formats

Earthscale supports a wide range of geospatial data formats for both raster and vector datasets. This page provides a complete reference of supported formats, upload methods, and any constraints.

## Raster Formats

Raster datasets represent gridded data such as satellite imagery, elevation models, and climate data.

| Format    | Extensions                                     | Notes                                                                                                                  |
| --------- | ---------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- |
| GeoTIFF   | `.tif`, `.tiff`, `.gtif`, `.gtiff`, `.geotiff` | Standard georeferenced TIFF. Cloud Optimized GeoTIFFs (COGs) are fully supported and recommended for best performance. |
| JPEG 2000 | `.jp2`                                         | Wavelet-based compression format commonly used for satellite imagery.                                                  |
| Zarr      | `.zarr` (directory)                            | Chunked, compressed array format ideal for large multidimensional datasets. Supports Zarr v2 and v3.                   |

### Cloud Optimized GeoTIFFs (COGs)

We recommend using [Cloud Optimized GeoTIFFs](https://www.cogeo.org/) when possible. COGs include internal tiling and overviews that enable efficient streaming directly from cloud storage, resulting in faster visualization and reduced data transfer.

### Multi-File Raster Datasets

Earthscale natively supports multi-file raster datasets including:

* **Raster mosaics**: Collections of GeoTIFFs covering different spatial tiles or UTM zones
* **Time series**: Files representing different timestamps with dates encoded in filenames
* **Multi-band collections**: Separate files for each spectral band

Use wildcard patterns like `gs://bucket/path/*.tif` to register these collections as a single dataset. See [Raster Mosaic and Time Series Support](https://docs.earthscale.ai/earthscale-documentation/advanced-dataset-configuration) for configuration options.

## Vector Formats

Vector datasets represent discrete geographic features such as points, lines, and polygons.

| Format     | Extensions                                      | Notes                                                                                                                |
| ---------- | ----------------------------------------------- | -------------------------------------------------------------------------------------------------------------------- |
| GeoJSON    | `.geojson`, `.json`                             | Standard JSON-based format for vector features. Also supports newline-delimited variants (`.geojsonl`, `.geojsons`). |
| FlatGeobuf | `.fgb`                                          | Binary format optimized for streaming and cloud storage. Recommended for large vector datasets.                      |
| GeoParquet | `.parquet`, `.geoparquet`                       | Column-oriented format with excellent compression. Ideal for large datasets with many attributes.                    |
| Shapefile  | `.shp` + `.shx` + `.dbf` (folder) or `.shp.zip` | Legacy format requiring multiple sidecar files. Upload as a folder or zipped archive.                                |
| KML/KMZ    | `.kml`, `.kmz`                                  | Google Earth format. KMZ files are compressed KML archives.                                                          |
| ZIP        | `.zip`                                          | Zipped shapefile archives are automatically extracted.                                                               |

### Shapefile Requirements

Shapefiles consist of multiple files that must be uploaded together:

* `.shp` - Feature geometry (required)
* `.shx` - Shape index (required)
* `.dbf` - Attribute data (required)
* `.prj` - Projection definition (recommended)
* `.cpg` - Character encoding (optional)

Upload shapefiles either as:

1. A **folder** containing all component files
2. A **ZIP archive** with all files at the root level

## Zarr Stores

Zarr is a format for chunked, compressed N-dimensional arrays. Earthscale automatically detects Zarr stores by looking for metadata files:

* `zarr.json` (Zarr v3)
* `.zmetadata` (consolidated metadata)
* `.zgroup` or `.zarray` (Zarr v2)

Zarr stores can include multiple variables with dimensions like time, x, y, and additional coordinates. Earthscale will automatically infer spatial dimensions and coordinate reference systems.

## Upload Limits

Different limits apply depending on your upload method:

### Local File Upload

| Constraint                   | Limit       |
| ---------------------------- | ----------- |
| Maximum files per upload     | 1,000 files |
| Maximum size per upload      | 10 GB       |
| Maximum individual file size | 10 GB       |

For larger datasets, use cloud storage and provide the URL directly.

### Cloud Storage

| Constraint                        | Limit        |
| --------------------------------- | ------------ |
| Maximum files for raster mosaics  | 15,000 files |
| Maximum files for vector datasets | 100 files    |
| Maximum vector dataset input size | 8 GB         |

Need to work with larger datasets? [Contact us](mailto:support@earthscale.ai) for enterprise options.

## Cloud Storage Sources

Connect datasets directly from your cloud storage:

| Provider             | URL Format                                               | Example                                                  |
| -------------------- | -------------------------------------------------------- | -------------------------------------------------------- |
| Google Cloud Storage | `gs://bucket/path`                                       | `gs://my-bucket/data/*.tif`                              |
| AWS S3               | `s3://bucket/path`                                       | `s3://my-bucket/imagery/scene.tif`                       |
| Azure Blob Storage   | `https://{account}.blob.core.windows.net/container/path` | `https://myaccount.blob.core.windows.net/data/image.tif` |
| Google Drive         | Shareable link                                           | Single files only                                        |
| HTTP/HTTPS           | `https://example.com/path`                               | Public URLs                                              |

Wildcard patterns (`*`) are supported for registering multiple files as a single dataset.

## Unsupported Formats

The following formats are not yet supported:

| Format                    | Alternative                                                         |
| ------------------------- | ------------------------------------------------------------------- |
| NetCDF (`.nc`, `.nc4`)    | Convert to Zarr using `xarray` or to GeoTIFF using `gdal_translate` |
| HDF4/HDF5 (`.hdf`, `.h5`) | Convert to Zarr or GeoTIFF                                          |
| PMTiles (`.pmtiles`)      | Use the source vector data format instead                           |
| GRIB (`.grib`, `.grib2`)  | Convert to Zarr or GeoTIFF                                          |

### Converting NetCDF to Zarr

```python
import xarray as xr

ds = xr.open_dataset("input.nc")
ds.to_zarr("output.zarr")
```

### Converting to Cloud Optimized GeoTIFF

This is not required, but may improve rendering speed.

```bash
gdal_translate input.tif output.tif -of COG -co COMPRESS=ZSTD
```

## Next Steps

* [Access Your Own Data](https://docs.earthscale.ai/earthscale-documentation/access-your-own-data) - Connect your cloud storage
* [Raster Mosaic and Time Series Support](https://docs.earthscale.ai/earthscale-documentation/advanced-dataset-configuration) - Configure multi-file datasets
* [Local File Upload](https://docs.earthscale.ai/earthscale-documentation/access-your-own-data/local-file-upload) - Upload from your computer
