{ "cells": [ { "cell_type": "markdown", "id": "dbf4ed04", "metadata": {}, "source": [ "# Work with Measurement Data\n", "\n", "In this example Notebook, we show you how to use the *Peak ODS Adapter for Apache Spark* to interact with ODS data using Spark SQL and DataFrames.\n", "\n", "The first section is on configuring the Spark framework and the *Peak ODS Adapter for Apache Spark*. The fun starts with \"Work with Measurement Data\".\n", "\n", "Happy sparking!\n", "\n" ] }, { "cell_type": "markdown", "id": "044c0c76", "metadata": {}, "source": [ "## Initialize Spark\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "id": "dbca34ba", "metadata": {}, "source": [ "### Configure Spark\n", "\n", "Initialize the Spark context and configure it for using the *Peak ODS Adapter for Apache Spark* as plugin. \n", "\n", "In this example we create and connect to a local Spark Master.\n", "\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "naughty-grenada", "metadata": {}, "outputs": [], "source": [ "from pyspark import SparkConf\n", "from pyspark.sql import SparkSession\n", "\n", "conf = SparkConf().set(\"spark.jars\", \"/target/spark-ods.jar\")\n", "conf.set(\"spark.sql.repl.eagerEval.enabled\",True)\n", "\n", "\n", "spark = SparkSession.builder.master('local[*]').config(conf = conf).getOrCreate() # or 'spark://spark-master:7077'\n", "sc = spark.sparkContext" ] }, { "cell_type": "markdown", "id": "pursuant-natural", "metadata": {}, "source": [ "### Initialize the Peak ODS Adapter for Apache Spark. \n", "\n", "To work with the *Peak ODS Adapter for Apache Spark*, you need to define the connection information `conInfo` to the *Peak ODS Server* together with the location of the bulk data files on disc.\n", "\n", "The connection information is then passed to the `connectionManager` to establish the ODS connection. This `odsConnection` has to be provided in all Spark ODS operations.\n", "\n", "> You have to add an override to the ODS MULTI_VOLUME symbol `DISC1` to access the bulk data files in the Spark environment. " ] }, { "cell_type": "code", "execution_count": 2, "id": "functioning-apollo", "metadata": {}, "outputs": [], "source": [ "conInfo = {\n", " \"url\": \"http://nvhdemo:8080/api/\",\n", " \"user\": \"sa\",\n", " \"password\": \"sa\",\n", " \"override.symbol.DISC1\": \"file:///data/NVH/\"\n", "}\n", "\n", "connectionManager = sc._jvm.com.peaksolution.sparkods.ods.ConnectionManager.instance\n", "odsConnection = connectionManager.createODSConnection(conInfo)" ] }, { "cell_type": "markdown", "id": "da959089", "metadata": {}, "source": [ "## Work with Measurement Data\n", "\n", "In the previous chapter you've learned how to work with instance data, now let's have a look at the actual measurement data.\n", "You use `format(\"ods\")` to load measurement data.\n", "\n", " In our example we're looking for a measurement with a specific \"Id\" - you may want to try more fancy queries..." ] }, { "cell_type": "code", "execution_count": 3, "id": "b60d4e2a", "metadata": {}, "outputs": [], "source": [ "df = spark.read.format(\"ods\").options(**odsConnection).load(\"where MeaResult.Id = 3\")" ] }, { "cell_type": "markdown", "id": "7976ac0c", "metadata": {}, "source": [ "You can now look at the first 10 rows..." ] }, { "cell_type": "code", "execution_count": 4, "id": "99e5cb87", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
idref | channel01 | channel02 | channel03 | channel04 | channel05 | channel06 | channel07 | channel08 | channel09 | channel10 | x-axis |
---|---|---|---|---|---|---|---|---|---|---|---|
NVHDEMO_SubMatrix_3 | 4.38541E-6 | 2.02778 | -4.44111 | -4.51025 | 1.86265E-6 | -1.74623E-7 | -0.192593 | 0.770431 | -0.579521 | 0.371926 | 1 |
NVHDEMO_SubMatrix_3 | 4.38541E-6 | 2.02778 | -2.03551 | -4.51025 | 1.86265E-6 | -1.74623E-7 | -0.192593 | 0.770431 | -0.579521 | 0.371926 | 2 |
NVHDEMO_SubMatrix_3 | 4.38541E-6 | 2.02778 | -4.44111 | -4.51025 | -6.52153 | -1.74623E-7 | -0.192593 | 0.770431 | -0.579521 | 0.371926 | 3 |
NVHDEMO_SubMatrix_3 | 2.40175 | 2.02778 | -4.44111 | 2.00455 | -6.52153 | -1.74623E-7 | -0.192593 | 0.770431 | -0.579521 | 0.371926 | 4 |
NVHDEMO_SubMatrix_3 | 4.38541E-6 | -0.368683 | -4.44111 | 2.00455 | -6.52153 | -1.74623E-7 | -0.192593 | 0.770431 | -0.579521 | 0.371926 | 5 |
NVHDEMO_SubMatrix_3 | 4.38541E-6 | -0.368683 | -2.03551 | -4.51025 | 1.86265E-6 | -1.74623E-7 | -0.192593 | 0.770431 | -0.579521 | 0.371926 | 6 |
NVHDEMO_SubMatrix_3 | 2.40175 | 2.02778 | -4.44111 | -4.51025 | -6.52153 | -1.74623E-7 | -0.192593 | 0.770431 | -0.579521 | 0.371926 | 7 |
NVHDEMO_SubMatrix_3 | 4.38541E-6 | -0.368683 | -2.03551 | -4.51025 | 1.86265E-6 | -1.74623E-7 | -0.192593 | 0.770431 | -0.579521 | 0.371926 | 8 |
NVHDEMO_SubMatrix_3 | 4.38541E-6 | 2.02778 | -4.44111 | -4.51025 | -6.52153 | -5.43436 | -0.192593 | 0.577823 | -0.579521 | 0.371926 | 9 |
NVHDEMO_SubMatrix_3 | 4.38541E-6 | 2.02778 | -2.03551 | -4.51025 | 1.86265E-6 | -1.74623E-7 | -0.192593 | 0.770431 | -0.579521 | 0.371926 | 10 |