### Start Local Development Server

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/README.md

Starts a local development server for the website. Changes are reflected live without server restart.

```console
yarn start
```

--------------------------------

### NPM Run Commands Overview

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/README.md

Lists available lifecycle scripts for the project, including start, build, deploy, and more.

```shell
$ npm run                                                                                           (base) 
Lifecycle scripts included in documentation:
  start
    docusaurus start

available via `npm run-script`:
  docusaurus
    docusaurus
  build
    docusaurus build
  swizzle
    docusaurus swizzle
  deploy
    docusaurus deploy
  clear
    docusaurus clear
  serve
    docusaurus serve
  write-translations
    docusaurus write-translations
  write-heading-ids
    docusaurus write-heading-ids

$ npm run clear
$ npm run start
```

--------------------------------

### Install Website Dependencies

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/README.md

Installs the necessary dependencies for the website project using Yarn.

```console
yarn install
```

--------------------------------

### SQL Dependency for Spark 3.x and Scala 2.12

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/spark-connector.mdx

Use this command to start Spark SQL with the osm4scala connector dependency for Spark 3.x and Scala 2.12.

```shell
bin/spark-sql --packages 'com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.11'
```

--------------------------------

### Run Docker Image for Jupyter Notebook

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/spark-connector.mdx

This command starts the jupyter/all-spark-notebook Docker image, exposing necessary ports for Spark and Jupyter. It's useful for setting up a Spark-enabled notebook environment.

```bash
docker run -e JUPYTER_ENABLE_LAB=yes -d -p 8888:8888 -p 4040:4040 -p 4041:4041 jupyter/all-spark-notebook
```

--------------------------------

### Compile Protobuf Source Code

Source: https://github.com/simplexspatial/osm4scala/blob/master/README.md

Execute this command to compile the protobuf source code, which is a special requirement for the project setup.

```shell
sbt compile
```

--------------------------------

### PySpark Dependency for Spark 3.x and Scala 2.12

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/spark-connector.mdx

Use this command to start PySpark with the osm4scala connector dependency for Spark 3.x and Scala 2.12.

```shell
bin/pyspark --packages 'com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.11'
```

--------------------------------

### Spark Shell Dependency for Spark 3.x and Scala 2.12

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/spark-connector.mdx

Use this command to start the Spark Shell with the osm4scala connector dependency for Spark 3.x and Scala 2.12.

```shell
bin/spark-shell --packages 'com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.11'
```

--------------------------------

### Count Node Primitives in a PBF File

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/standalone-scala-library.mdx

This example demonstrates how to count all node primitives in a PBF file using EntityIterator.fromPbf. It leverages Scala's functional programming features for concise data processing.

```scala
EntityIterator.fromPbf(inputStream).count(_.osmModel == OSMTypes.Node)
```

--------------------------------

### Create DataFrame from OSM PBF (Scala)

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/spark-connector.mdx

Read OSM PBF data into a Spark DataFrame using the 'osm.pbf' format. This example demonstrates counting primitives by type.

```scala
import com.acervera.osm4scala.spark.OsmSqlEntity
import org.apache.spark.sql.SparkSession

object PrimitivesCounter {

  def main(args: Array[String]): Unit = {
    val spark = SparkSession
      .builder()
      .appName("Primitives counter")
      .getOrCreate()

    spark.read
      .format("osm.pbf")
      .load(args(0))
      .groupBy(OsmSqlEntity.FIELD_TYPE)
      .count
      .show
  }
}
```

--------------------------------

### Build Website for Deployment

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/README.md

Generates static content for the website into the 'build' directory, ready for hosting.

```console
yarn build
```

--------------------------------

### Deploy Documentation Site

Source: https://github.com/simplexspatial/osm4scala/blob/master/README.md

Deploy the project documentation and site. Ensure Node.js is managed with nvm and set the GIT_USER and USE_SSH environment variables.

```bash
git checkout v1.*.*
cd website
nvm use
export GIT_USER=<username>; export USE_SSH=true; npm run deploy
```

--------------------------------

### Initialize Spark with osm4scala Package

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/static/notebooks/spylon_notebook_example.ipynb

Use this cell to initialize a Spark environment and include the osm4scala package. Ensure the package version matches your Spark and Scala versions.

```python
%%init_spark
launcher.packages = ["com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.11"]
```

--------------------------------

### Deploy Website to GitHub Pages

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/README.md

Builds and deploys the website to the 'gh-pages' branch, suitable for GitHub Pages hosting.

```console
GIT_USER=<Your GitHub username> USE_SSH=true yarn deploy
```

--------------------------------

### Release Project

Source: https://github.com/simplexspatial/osm4scala/blob/master/README.md

Checkout the master branch and execute the release command to initiate the project release process.

```shell
git checkout master
sbt release
```

--------------------------------

### Bundle and Release to Sonatype

Source: https://github.com/simplexspatial/osm4scala/blob/master/README.md

After publishing signed artifacts, execute this command to bundle and release them to Sonatype.

```shell
sbt sonatypeBundleRelease
```

--------------------------------

### Configure SBT for Dependency Shading

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/spark-connector.mdx

This Scala code configures SBT to shade the 'com.google.protobuf' package to 'shadeproto' to resolve dependency conflicts with older versions of the Google Protobuf library used by Spark and Hadoop.

```scala
assemblyShadeRules in assembly := Seq(
ShadeRule
  .rename("com.google.protobuf.**" -> "shadeproto.@1")
  .inAll
)
```

--------------------------------

### Submit Spark Application (Scala)

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/spark-connector.mdx

Submit a Spark application that uses the osm4scala connector. The '--packages' argument is optional if the dependency is included in the deployable artifact.

```shell
bin/spark-submit \
    --packages 'com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.11' \
    examples/spark-documentation/target/scala-2.12/osm4scala-examples-spark-documentation_2.12-1.0.11.jar \
    /tmp/osm/monaco-anonymized.osm.pbf
```

--------------------------------

### Create Table for OSM Data in Spark SQL

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/spark-connector.mdx

Create a table in Spark SQL to read OSM data from PBF files using the osm.pbf provider.

```sql
CREATE TABLE osm USING osm.pbf LOCATION '<osm files path here>';
```

--------------------------------

### Run osm4scala-spark-utilities Counter Command

Source: https://github.com/simplexspatial/osm4scala/blob/master/examples/spark-utilities/README.md

Submit a Spark job to count OSM primitives (Nodes, Ways, Relations) from PBF files. Specify input path, output path, coalesce factor, and output format.

```shell
bin/spark-submit \
    --packages 'com.github.scopt:scopt_2.12:3.7.1,com.acervera.osm4scala:osm4scala-spark-shaded_2.12:1.0.11' \
    --class com.acervera.osm4scala.examples.spark.Driver \
    "osm4scala-examples-spark-utilities_2.12-1.0.11.jar" \
    counter \
    -i <pbf files path> \
    -o <output> \
    -c 1 \
    -f csv
```

--------------------------------

### Count Primitives by Type using SQL (Scala)

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/spark-connector.mdx

Execute a SQL query on the 'osm' temporary view to count the number of primitives for each type. Assumes the 'osm' view has been created.

```scala
spark.sql("select type, count(*) as num_primitives from osm group by type").show()
```

--------------------------------

### Run osm4scala-spark-utilities Tag Keys Command

Source: https://github.com/simplexspatial/osm4scala/blob/master/examples/spark-utilities/README.md

Submit a Spark job to extract tag keys from OSM primitives (Nodes, Ways, Relations) in PBF files. Specify input path, output path, coalesce factor, and output format.

```shell
bin/spark-submit \
    --packages 'com.github.scopt:scopt_2.12:3.7.1,com.acervera.osm4scala:osm4scala-spark-shaded_2.12:1.0.11' \
    --class com.acervera.osm4scala.examples.spark.Driver \
    "osm4scala-examples-spark-utilities_2.12-1.0.11.jar" \
    tag_keys \
    -i <pbf files path> \
    -o <output> \
    -c 1 \
    -f csv
```

--------------------------------

### Run Tests for Scala Versions

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/contributing.mdx

Execute tests for different Scala versions. Set PATCH_211 to true or false to include or exclude Scala 2.11.

```shell
PATCH_211=false sbt clean +test
```

```shell
PATCH_211=true sbt clean +test
```

--------------------------------

### Submit Spark Application (PySpark)

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/spark-connector.mdx

Submit a PySpark application using the osm4scala connector. The '--packages' flag can be omitted if the JAR is part of the deployment.

```shell
bin/spark-submit \
    --packages 'com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.11' \
    examples/spark-documentation/src/main/scala/com/acervera/osm4scala/examples/spark/documentation/PrimiriveCounter.py \
    /tmp/osm/monaco-anonymized.osm.pbf
```

--------------------------------

### Create Table for OSM PBF with Split Disabled (SQL)

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/spark-connector.mdx

Use this SQL snippet to create a table for OSM PBF files with the 'split' option disabled, preventing Spark from splitting individual PBF files for parallelization.

```sql
spark-sql> CREATE TABLE osm USING osm.pbf OPTIONS ( 'split' = 'false' ) LOCATION '<osm files path here>';
```

--------------------------------

### Load OSM PBF into SQL View (Scala)

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/spark-connector.mdx

Load OSM PBF files into a Spark DataFrame and create a temporary SQL view named 'osm'. This allows for SQL-based querying.

```scala
val osmDF = spark.sqlContext.read.format("osm.pbf").load("<osm files path here>")
osmDF.createOrReplaceTempView("osm")
```

--------------------------------

### Create DataFrame from OSM PBF (PySpark)

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/spark-connector.mdx

Load OSM PBF files into a PySpark DataFrame and perform a count by type. Ensure the 'osm.pbf' format is available at runtime.

```python
from pyspark.sql import SparkSession
import sys

if __name__ == '__main__':

    spark = SparkSession.builder.appName("Primitives counter").getOrCreate()

    spark.read.format("osm.pbf")\
        .load(sys.argv[1])\
        .groupBy("type")\
        .count()\
        .show()
```

--------------------------------

### Count Primitives in Full Planet PBF

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/performance.mdx

Measures the time taken to count all primitives in the full planet PBF file (approx. 4 billion elements). Memory usage is negligible.

```text
Found [3,976,885,170] primitives in /media/angelcervera/My Passport/osm/planet-latest.osm.pbf in 2,566.11 sec.
```

--------------------------------

### Add Bintray Repository for osm4scala

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/standalone-scala-library.mdx

If you encounter issues resolving osm4scala dependencies, add this Bintray repository to your resolvers in sbt. This is typically only needed if direct resolution fails.

```scala
resolvers += "osm4scala repo" at "http://dl.bintray.com/angelcervera/maven"
```

--------------------------------

### Load OSM Data in Spark Shell (Scala)

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/spark-connector.mdx

Load OSM data from PBF files into a Spark DataFrame using the osm.pbf format in Scala.

```scala
val osmDF = spark.read.format("osm.pbf").load("<osm files path here>")
```

--------------------------------

### Load and Query OSM PBF Data with Spark (Python)

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/static/notebooks/spylon_notebook_example.ipynb

Load an OSM PBF file into a Spark DataFrame and filter for elements with a 'highway' tag equal to 'traffic_signals'. This is useful for extracting specific geographic features.

```python
%%python
osm_df = spark.read.format("osm.pbf").load("/home/jovyan/work/monaco-anonymized.osm.pbf")
osm_df.select("latitude", "longitude").where("element_at(tags, 'highway') == 'traffic_signals'").show()
```

--------------------------------

### Count Primitives in Spain PBF

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/performance.mdx

Measures the time taken to count all primitives in the Spain PBF file. Memory usage is negligible.

```text
Found [67,976,861] primitives in /home/angelcervera/projects/osm/spain-latest.osm.pbf in 32.44 sec.
```

--------------------------------

### Add osm4scala-core Dependency with Maven

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/standalone-scala-library.mdx

Integrate the osm4scala-core library into your Maven project by adding the provided dependency configuration to your pom.xml file.

```xml
<dependency>
  <groupId>com.acervera.osm4scala</groupId>
  <artifactId>osm4scala-core_${scala-version}</artifactId>
  <version>${version}</version>
</dependency>
```

--------------------------------

### Add osm4scala-core Dependency with sbt

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/standalone-scala-library.mdx

Include the osm4scala-core library in your Scala project using sbt by adding the specified dependency to your build.sbt file.

```scala
libraryDependencies += "com.acervera.osm4scala" %% "osm4scala-core" % "<version>"
```

--------------------------------

### Publish Signed Artifacts

Source: https://github.com/simplexspatial/osm4scala/blob/master/README.md

Publish signed artifacts to Maven Central. Ensure GPG keys and Sonatype credentials are set up correctly. Use the PATCH_211 flag to manage Scala 2.11 compatibility.

```shell
git checkout v1.*.*
sbt clean
PATCH_211=false sbt +publishSigned
```

```shell
PATCH_211=true sbt +publishSigned
```

--------------------------------

### Testing with PATCH_211 Flag

Source: https://github.com/simplexspatial/osm4scala/blob/master/README.md

Run tests for different Scala versions using the PATCH_211 flag. Set to 'false' for default behavior or 'true' to enable Scala 2.11 compatibility.

```shell
PATCH_211=false sbt +test
```

```shell
PATCH_211=true sbt +test
```

--------------------------------

### Extract Traffic Lights as POIs in Spark Shell

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/spark-connector.mdx

Extract latitude, longitude, and tags for all traffic lights from the dataset using Spark SQL.

```Scala
scala> osmDF.select("latitude", "longitude", "tags").where("element_at(tags, 'highway') == 'traffic_signals'").show(10,false)
+------------------+-------------------+------------------------------------------------------------------------------+
|latitude          |longitude          |tags                                                                          |
+------------------+-------------------+------------------------------------------------------------------------------+
|54.59766649999997 |-5.8889806000000045|[highway -> traffic_signals]                                                  |
|54.58006689999997 |-5.938683200000003 |[highway -> traffic_signals, traffic_signals -> signal]                       |
|54.58260049999997 |-5.946187600000005 |[direction -> backward, highway -> traffic_signals, traffic_signals -> signal]|
|51.90097769999996 |-8.470285700000005 |[highway -> traffic_signals]                                                  |
|51.901616299999965|-8.470139700000004 |[highway -> traffic_signals]                                                  |
|51.89978239999997 |-8.465829200000002 |[highway -> traffic_signals]                                                  |
|51.89707529999997 |-8.474892800000001 |[highway -> traffic_signals]                                                  |
|51.89784849999997 |-8.466895200000002 |[highway -> traffic_signals]                                                  |
|51.89547809999997 |-8.476100900000002 |[highway -> traffic_signals]                                                  |
|51.89772569999997 |-8.477145100000003 |[highway -> traffic_signals]                                                  |
+------------------+-------------------+------------------------------------------------------------------------------+
only showing top 10 rows

```

```PySpark
>>> osmDF.select("latitude", "longitude", "tags").where("element_at(tags, 'highway') == 'traffic_signals'").show(10,False)
+------------------+-------------------+------------------------------------------------------------------------------+
|latitude          |longitude          |tags                                                                          |
+------------------+-------------------+------------------------------------------------------------------------------+
|54.59766649999997 |-5.8889806000000045|[highway -> traffic_signals]                                                  |
|54.58006689999997 |-5.938683200000003 |[highway -> traffic_signals, traffic_signals -> signal]                       |
|54.58260049999997 |-5.946187600000005 |[direction -> backward, highway -> traffic_signals, traffic_signals -> signal]|
|51.90097769999996 |-8.470285700000005 |[highway -> traffic_signals]                                                  |
|51.901616299999965|-8.470139700000004 |[highway -> traffic_signals]                                                  |
|51.89978239999997 |-8.465829200000002 |[highway -> traffic_signals]                                                  |
|51.89707529999997 |-8.474892800000001 |[highway -> traffic_signals]                                                  |
|51.89784849999997 |-8.466895200000002 |[highway -> traffic_signals]                                                  |
|51.89547809999997 |-8.476100900000002 |[highway -> traffic_signals]                                                  |
|51.89772569999997 |-8.477145100000003 |[highway -> traffic_signals]                                                  |
+------------------+-------------------+------------------------------------------------------------------------------+
only showing top 10 rows

```

```SQL
spark-sql> select latitude, longitude, tags from osm where type = 0 and element_at(tags, "highway") == 'traffic_signals' limit 10;
40.42125            -3.6844500000000004 {"crossing":"traffic_signals","crossing_ref":"zebra","highway":"traffic_signals"}

```

--------------------------------

### Read OSM PBF with Split Disabled (Scala)

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/spark-connector.mdx

Use this Scala snippet to read OSM PBF files with the 'split' option disabled, preventing Spark from splitting individual PBF files for parallelization.

```scala
scala> val osmDF = spark.read.format("osm.pbf").option("split", "false").load("<osm files path here>")
```

--------------------------------

### Add osm4scala Spark Dependency (SBT)

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/spark-connector.mdx

Include the osm4scala Spark shaded dependency in your SBT project to use it in Spark applications. This is necessary for Scala and Python environments.

```sbt
libraryDependencies += "com.acervera.osm4scala" % "osm4scala-spark3-shaded_2.12" % "1.0.11"
```

--------------------------------

### Extract Unique Tags by Primitive Type from Spain PBF

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/performance.mdx

Measures the time taken to extract unique tags from specific primitive types (Way) in the Spain PBF file. The list of tags is stored in a text file.

```text
Found [2,451] different tags in primitives of type [Way] in /home/angelcervera/projects/osm/spain-latest.osm.pbf. List stored in /home/angelcervera/projects/osm/spain-latest.tags.txt. Time to process: 33.47 sec.
```

--------------------------------

### Load OSM Data in PySpark

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/spark-connector.mdx

Load OSM data from PBF files into a Spark DataFrame using the osm.pbf format in PySpark.

```python
osmDF = spark.read.format("osm.pbf").load("<osm files path here>")
```

--------------------------------

### Load and Query OSM PBF Data with Spark (Scala)

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/static/notebooks/spylon_notebook_example.ipynb

Load an OSM PBF file into a Spark DataFrame and filter for elements with a 'highway' tag equal to 'traffic_signals'. This is useful for extracting specific geographic features.

```scala
val osmDF = spark.read.format("osm.pbf").load("/home/jovyan/work/monaco-anonymized.osm.pbf")
osmDF.select("latitude", "longitude")
    .where("element_at(tags, 'highway') == 'traffic_signals'")
    .show
```

--------------------------------

### Extract Relation Data with Spark SQL

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/spark-connector.mdx

This snippet demonstrates how to query OSM data using Spark SQL, extracting specific fields like id, type, version, user information, and formatted timestamp from relations where the user ID is not null. It shows the first 5 rows of the result.

```scala
spark.sql("select id, type, info.version, info.userId, info.userName, date_format(info.timestamp, \"dd-MMM-y kk:mm:ss z\") as timestamp from osm where info.userId IS NOT NULL").show(5, false)
```

--------------------------------

### Extract Unique Tags from Spain PBF

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/performance.mdx

Measures the time taken to extract all unique tags from the Spain PBF file. The list of tags is stored in a text file.

```text
Found [4,166] different tags in /home/angelcervera/projects/osm/spain-latest.osm.pbf. List stored in /home/angelcervera/projects/osm/spain-latest.tags.txt. Time to process: 39.22 sec.
```

--------------------------------

### Extract Way Data with Spark SQL

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/spark-connector.mdx

Query all ways from the 'osm' table, selecting their ID, the list of nodes they contain, and their tags. This is useful for analyzing linear features like roads.

```scala
spark.sql("select id, nodes, tags from osm where type = 1").show()

```

--------------------------------

### Count Specific Primitive Types in Spain PBF

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/performance.mdx

Measures the time taken to count specific primitive types (Way, Node, Relation) in the Spain PBF file. Memory usage is negligible.

```text
Found [4,839,505] primitives of type [Way] in /home/angelcervera/projects/osm/spain-latest.osm.pbf in 31.72 sec.
Found [63,006,432] primitives of type [Node] in /home/angelcervera/projects/osm/spain-latest.osm.pbf in 32.70 sec.
Found [130,924] primitives of type [Relation] in /home/angelcervera/projects/osm/spain-latest.osm.pbf in 32.66 sec.
```

--------------------------------

### Count OSM Primitives in Spark Shell

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/spark-connector.mdx

Count the number of different OSM primitive types (nodes, ways, relations) in the dataset using Spark SQL.

```Scala
scala> osmDF.groupBy("type").count().show()
+----+--------+
|type|   count|
+----+--------+
|   1| 2096455|
|   2|   91971|
|   0|19426617|
+----+--------+

```

```PySpark
>>> osmDF.groupBy("type").count().show()
+----+--------+
|type|   count|
+----+--------+
|   1| 2096455|
|   2|   91971|
|   0|19426617|
+----+--------+

```

```SQL
spark-sql> select type, count(type) from osm group by type
1   338795
2   10357
0   2328075

```

--------------------------------

### Read OSM PBF with Split Disabled (PySpark)

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/spark-connector.mdx

Use this PySpark snippet to read OSM PBF files with the 'split' option disabled, preventing Spark from splitting individual PBF files for parallelization.

```python
>>> osmDF = spark.read.format("osm.pbf").option("split", "false").load("<osm files path here>")
```

--------------------------------

### Extract All Unique Tag Keys using SQL (Scala)

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/spark-connector.mdx

Query the 'osm' SQL view to extract all distinct keys used in the 'tags' map column. This query uses `map_keys` and `explode` functions.

```scala
spark.sql("select distinct(explode(map_keys(tags))) as tag_key from osm order by tag_key asc").show()
```

--------------------------------

### Extract Node Data with Spark SQL

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/spark-connector.mdx

Query all nodes from the 'osm' table, selecting their ID, coordinates, and tags. This is useful for analyzing point-based features.

```scala
spark.sql("select id, latitude, longitude, tags from osm where type = 0").show()

```

--------------------------------

### Add osm4scala Spark Dependency (Maven)

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/spark-connector.mdx

Include the osm4scala Spark shaded dependency in your Maven project for use in Spark applications. This applies to both Scala and Python.

```xml
<dependency>
    <groupId>com.acervera.osm4scala</groupId>
    <artifactId>osm4scala-spark3-shaded_2.12</artifactId>
    <version>1.0.11</version>
</dependency>
```

--------------------------------

### Extract Relation Data with Spark SQL

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/spark-connector.mdx

Query all relations from the 'osm' table, selecting their ID, the members they relate, and their tags. This is useful for analyzing complex geographic structures.

```scala
spark.sql("select id, relations, tags from osm where type = 2").show()

```

--------------------------------

### Parallel Counting with Scala Future.traverse

Source: https://github.com/simplexspatial/osm4scala/blob/master/website/docs/performance.mdx

Process data in parallel using Scala's Future.traverse. This method is suitable for datasets that fit into memory, as it creates a Future for each element, potentially leading to high memory consumption.

```scala
  val counter = new AtomicLong()
  def count(pbfIS: InputStream): Long = {
    val result = Future.traverse(BlobTupleIterator.fromPbf(pbfIS))(tuple => Future {
      counter.addAndGet( count(tuple._2) )
    })
    Await.result(result, Duration.Inf)
    counter.longValue()
  }
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.