{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "
\n", " \n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", " Time Series Data Analysis with Python\n", "

\n", " Deanna Spindler

\n", " IMSG at NCEP/EMC
\n", " Verification, Post-Processing and Product Generation Branch
\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

What is pandas?

\n", "

\n", "The name comes from panel data, a statistics term for multidimensional datasets.

\n", "A high-perfomance open source library for tabular data manipulation and analysis\n", " developed by Wes McKinney in 2008.

" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "

What does it do?

\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

Example

\n", " \n", "

Using pandas DataFrames to validate WAVEWATCH III model output with NDBC buoy data.

\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "

\n", "Python libraries provide a complete took-kit for data science and analysis:

\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

First things first

" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy\n", "import pandas\n", "import tarfile\n", "import xarray \n", "import netCDF4\n", "from datetime import datetime,timedelta\n", "from dateutil.relativedelta import relativedelta" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "

Start by choosing a buoy and period of interest

" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "buoyID='41008'\n", "year=2018\n", "month=6" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

Get the quality controlled NDBC data

" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "url='https://dods.ndbc.noaa.gov/thredds/dodsC/data/stdmet/'+buoyID+'/'+ \\\n", " buoyID+'h'+str(year)+'.nc'\n", "ncdata=xarray.open_dataset(url,decode_times=True)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "

Select the specific month of the year

" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "startat=datetime(year,month,1)\n", "stopat=startat+relativedelta(months=1)-relativedelta(days=1)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "

Subset the data to this time period, and make it a DataFrame

" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "data=ncdata.sel(time=slice(startat,stopat)).to_dataframe()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

Take a look at what is there

" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "Index(['wind_dir', 'wind_spd', 'gust', 'wave_height', 'dominant_wpd',\n", " 'average_wpd', 'mean_wave_dir', 'air_pressure', 'air_temperature',\n", " 'sea_surface_temperature', 'dewpt_temperature', 'visibility',\n", " 'water_level'],\n", " dtype='object')" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.keys()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

Another way is to look at the first few rows of the DataFrame

" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
wind_dirwind_spdgustwave_heightdominant_wpdaverage_wpdmean_wave_dirair_pressureair_temperaturesea_surface_temperaturedewpt_temperaturevisibilitywater_level
latitudelongitudetime
31.4-80.8679962018-06-01 00:50:00172.08.29.40.8000:00:07.69000000:00:03.950000100.01014.50000025.70000126.100000NaNNaNNaN
2018-06-01 01:50:00178.07.58.50.8900:00:07.14000000:00:03.940000109.01015.20001225.70000125.799999NaNNaNNaN
2018-06-01 02:50:00186.07.08.20.8000:00:07.69000000:00:03.900000113.01015.20001225.70000125.700001NaNNaNNaN
2018-06-01 03:50:00200.06.98.80.8600:00:07.69000000:00:04.160000103.01015.29998825.79999925.600000NaNNaNNaN
2018-06-01 04:50:00205.06.37.30.7900:00:03.70000000:00:04.080000171.01014.79998825.60000025.600000NaNNaNNaN
\n", "
" ], "text/plain": [ " wind_dir wind_spd gust \\\n", "latitude longitude time \n", "31.4 -80.867996 2018-06-01 00:50:00 172.0 8.2 9.4 \n", " 2018-06-01 01:50:00 178.0 7.5 8.5 \n", " 2018-06-01 02:50:00 186.0 7.0 8.2 \n", " 2018-06-01 03:50:00 200.0 6.9 8.8 \n", " 2018-06-01 04:50:00 205.0 6.3 7.3 \n", "\n", " wave_height dominant_wpd \\\n", "latitude longitude time \n", "31.4 -80.867996 2018-06-01 00:50:00 0.80 00:00:07.690000 \n", " 2018-06-01 01:50:00 0.89 00:00:07.140000 \n", " 2018-06-01 02:50:00 0.80 00:00:07.690000 \n", " 2018-06-01 03:50:00 0.86 00:00:07.690000 \n", " 2018-06-01 04:50:00 0.79 00:00:03.700000 \n", "\n", " average_wpd mean_wave_dir \\\n", "latitude longitude time \n", "31.4 -80.867996 2018-06-01 00:50:00 00:00:03.950000 100.0 \n", " 2018-06-01 01:50:00 00:00:03.940000 109.0 \n", " 2018-06-01 02:50:00 00:00:03.900000 113.0 \n", " 2018-06-01 03:50:00 00:00:04.160000 103.0 \n", " 2018-06-01 04:50:00 00:00:04.080000 171.0 \n", "\n", " air_pressure air_temperature \\\n", "latitude longitude time \n", "31.4 -80.867996 2018-06-01 00:50:00 1014.500000 25.700001 \n", " 2018-06-01 01:50:00 1015.200012 25.700001 \n", " 2018-06-01 02:50:00 1015.200012 25.700001 \n", " 2018-06-01 03:50:00 1015.299988 25.799999 \n", " 2018-06-01 04:50:00 1014.799988 25.600000 \n", "\n", " sea_surface_temperature \\\n", "latitude longitude time \n", "31.4 -80.867996 2018-06-01 00:50:00 26.100000 \n", " 2018-06-01 01:50:00 25.799999 \n", " 2018-06-01 02:50:00 25.700001 \n", " 2018-06-01 03:50:00 25.600000 \n", " 2018-06-01 04:50:00 25.600000 \n", "\n", " dewpt_temperature visibility \\\n", "latitude longitude time \n", "31.4 -80.867996 2018-06-01 00:50:00 NaN NaN \n", " 2018-06-01 01:50:00 NaN NaN \n", " 2018-06-01 02:50:00 NaN NaN \n", " 2018-06-01 03:50:00 NaN NaN \n", " 2018-06-01 04:50:00 NaN NaN \n", "\n", " water_level \n", "latitude longitude time \n", "31.4 -80.867996 2018-06-01 00:50:00 NaN \n", " 2018-06-01 01:50:00 NaN \n", " 2018-06-01 02:50:00 NaN \n", " 2018-06-01 03:50:00 NaN \n", " 2018-06-01 04:50:00 NaN " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

I like to index on a datetime stamp, so let's reset the index

" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "data=data.reset_index()\n", "data.index=data['time'].dt.round('1H')\n", "data.index.name='datetime'" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
latitudelongitudetimewind_dirwind_spdgustwave_heightdominant_wpdaverage_wpdmean_wave_dirair_pressureair_temperaturesea_surface_temperaturedewpt_temperaturevisibilitywater_level
datetime
2018-06-01 01:00:0031.4-80.8679962018-06-01 00:50:00172.08.29.40.8000:00:07.69000000:00:03.950000100.01014.50000025.70000126.100000NaNNaNNaN
2018-06-01 02:00:0031.4-80.8679962018-06-01 01:50:00178.07.58.50.8900:00:07.14000000:00:03.940000109.01015.20001225.70000125.799999NaNNaNNaN
2018-06-01 03:00:0031.4-80.8679962018-06-01 02:50:00186.07.08.20.8000:00:07.69000000:00:03.900000113.01015.20001225.70000125.700001NaNNaNNaN
2018-06-01 04:00:0031.4-80.8679962018-06-01 03:50:00200.06.98.80.8600:00:07.69000000:00:04.160000103.01015.29998825.79999925.600000NaNNaNNaN
2018-06-01 05:00:0031.4-80.8679962018-06-01 04:50:00205.06.37.30.7900:00:03.70000000:00:04.080000171.01014.79998825.60000025.600000NaNNaNNaN
\n", "
" ], "text/plain": [ " latitude longitude time wind_dir \\\n", "datetime \n", "2018-06-01 01:00:00 31.4 -80.867996 2018-06-01 00:50:00 172.0 \n", "2018-06-01 02:00:00 31.4 -80.867996 2018-06-01 01:50:00 178.0 \n", "2018-06-01 03:00:00 31.4 -80.867996 2018-06-01 02:50:00 186.0 \n", "2018-06-01 04:00:00 31.4 -80.867996 2018-06-01 03:50:00 200.0 \n", "2018-06-01 05:00:00 31.4 -80.867996 2018-06-01 04:50:00 205.0 \n", "\n", " wind_spd gust wave_height dominant_wpd \\\n", "datetime \n", "2018-06-01 01:00:00 8.2 9.4 0.80 00:00:07.690000 \n", "2018-06-01 02:00:00 7.5 8.5 0.89 00:00:07.140000 \n", "2018-06-01 03:00:00 7.0 8.2 0.80 00:00:07.690000 \n", "2018-06-01 04:00:00 6.9 8.8 0.86 00:00:07.690000 \n", "2018-06-01 05:00:00 6.3 7.3 0.79 00:00:03.700000 \n", "\n", " average_wpd mean_wave_dir air_pressure \\\n", "datetime \n", "2018-06-01 01:00:00 00:00:03.950000 100.0 1014.500000 \n", "2018-06-01 02:00:00 00:00:03.940000 109.0 1015.200012 \n", "2018-06-01 03:00:00 00:00:03.900000 113.0 1015.200012 \n", "2018-06-01 04:00:00 00:00:04.160000 103.0 1015.299988 \n", "2018-06-01 05:00:00 00:00:04.080000 171.0 1014.799988 \n", "\n", " air_temperature sea_surface_temperature \\\n", "datetime \n", "2018-06-01 01:00:00 25.700001 26.100000 \n", "2018-06-01 02:00:00 25.700001 25.799999 \n", "2018-06-01 03:00:00 25.700001 25.700001 \n", "2018-06-01 04:00:00 25.799999 25.600000 \n", "2018-06-01 05:00:00 25.600000 25.600000 \n", "\n", " dewpt_temperature visibility water_level \n", "datetime \n", "2018-06-01 01:00:00 NaN NaN NaN \n", "2018-06-01 02:00:00 NaN NaN NaN \n", "2018-06-01 03:00:00 NaN NaN NaN \n", "2018-06-01 04:00:00 NaN NaN NaN \n", "2018-06-01 05:00:00 NaN NaN NaN " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

The parameter names are long, and there are columns that are not used.

\n", "\n", "Let's fix that

" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "params={'wind_dir':'udir',\n", " 'wind_spd':'u10',\n", " 'wave_height':'Hs',\n", " 'dominant_wpd':'Tp',\n", " 'longitude':'lon',\n", " 'latitude':'lat'}\n", "dropkeys=[key for key in data if key not in params]\n", "data.drop(dropkeys,axis=1,inplace=True)\n", "data.rename(columns=params,inplace=True)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
latlonudiru10HsTp
datetime
2018-06-01 01:00:0031.4-80.867996172.08.20.8000:00:07.690000
2018-06-01 02:00:0031.4-80.867996178.07.50.8900:00:07.140000
2018-06-01 03:00:0031.4-80.867996186.07.00.8000:00:07.690000
2018-06-01 04:00:0031.4-80.867996200.06.90.8600:00:07.690000
2018-06-01 05:00:0031.4-80.867996205.06.30.7900:00:03.700000
\n", "
" ], "text/plain": [ " lat lon udir u10 Hs Tp\n", "datetime \n", "2018-06-01 01:00:00 31.4 -80.867996 172.0 8.2 0.80 00:00:07.690000\n", "2018-06-01 02:00:00 31.4 -80.867996 178.0 7.5 0.89 00:00:07.140000\n", "2018-06-01 03:00:00 31.4 -80.867996 186.0 7.0 0.80 00:00:07.690000\n", "2018-06-01 04:00:00 31.4 -80.867996 200.0 6.9 0.86 00:00:07.690000\n", "2018-06-01 05:00:00 31.4 -80.867996 205.0 6.3 0.79 00:00:03.700000" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

The Tp column does not look right...

" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "Pandas represents Timedeltas in nanosecond resolution using 64 bit integers" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "data.Tp=data.Tp.astype('timedelta64[s]').astype(float)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
latlonudiru10HsTp
datetime
2018-06-01 01:00:0031.4-80.867996172.08.20.807.0
2018-06-01 02:00:0031.4-80.867996178.07.50.897.0
2018-06-01 03:00:0031.4-80.867996186.07.00.807.0
2018-06-01 04:00:0031.4-80.867996200.06.90.867.0
2018-06-01 05:00:0031.4-80.867996205.06.30.793.0
\n", "
" ], "text/plain": [ " lat lon udir u10 Hs Tp\n", "datetime \n", "2018-06-01 01:00:00 31.4 -80.867996 172.0 8.2 0.80 7.0\n", "2018-06-01 02:00:00 31.4 -80.867996 178.0 7.5 0.89 7.0\n", "2018-06-01 03:00:00 31.4 -80.867996 186.0 7.0 0.80 7.0\n", "2018-06-01 04:00:00 31.4 -80.867996 200.0 6.9 0.86 7.0\n", "2018-06-01 05:00:00 31.4 -80.867996 205.0 6.3 0.79 3.0" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

Suppose we want to look at the data in a specific column,
\n", "or a Series:

" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "Hs=data['Hs']" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/export/emc-lw-tspindle/wd20ts/anaconda3/lib/python3.6/site-packages/numpy/lib/function_base.py:4291: RuntimeWarning: Invalid value encountered in percentile\n", " interpolation=interpolation)\n" ] } ], "source": [ "quartiles=numpy.percentile(Hs,[25,50,75])\n", "hs_min,hs_max=Hs.min(),Hs.max()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Min: 0.290\n", "Q1: nan\n", "Median: nan\n", "Q3: nan\n", "Max: 1.130\n" ] } ], "source": [ "print('Min: %.3f' % hs_min)\n", "print('Q1: %.3f' % quartiles[0])\n", "print('Median: %.3f' % quartiles[1])\n", "print('Q3: %.3f' % quartiles[2])\n", "print('Max: %.3f' % hs_max)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

Get rid of the NaN's

" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "Hs=data['Hs'].dropna()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Min: 0.290\n", "Q1: 0.460\n", "Median: 0.535\n", "Q3: 0.630\n", "Max: 1.130\n" ] } ], "source": [ "quartiles=numpy.percentile(Hs,[25,50,75])\n", "print('Min: %.3f' % Hs.min())\n", "print('Q1: %.3f' % quartiles[0])\n", "print('Median: %.3f' % quartiles[1])\n", "print('Q3: %.3f' % quartiles[2])\n", "print('Max: %.3f' % Hs.max())" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

It's nice to take a quick look at the data

" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "Hs.plot(grid=True);" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

In the above example, we used the OPeNDAP server to get the quality controlled NDBC data.

\n", "\n", "If we wanted near-realtime data, we could get it the same way...

" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "url='https://dods.ndbc.noaa.gov/thredds/dodsC/data/stdmet/'+buoyID+'/'+ \\\n", " buoyID+'h9999.nc'\n", "ncdata=xarray.open_dataset(url,decode_times=True)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

or another way

" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "

Unidata has been developing Siphon, a suite of easy-to-use utilities for accessing remote data sources.

" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
wind_directionwind_speedwind_gustwave_heightdominant_wave_periodaverage_wave_perioddominant_wave_directionpressureair_temperaturewater_temperaturedewpointvisibility3hr_pressure_tendencywater_level_above_meantime
080.04.05.00.812.09.1100.01017.927.529.022.0NaN0.9NaN2018-09-11 15:50:00
190.03.04.00.911.08.697.01017.927.628.921.9NaN1.4NaN2018-09-11 14:50:00
210.03.03.00.911.08.1101.01017.427.828.824.3NaN1.8NaN2018-09-11 13:50:00
3320.03.03.00.911.08.289.01017.027.828.822.9NaN2.0NaN2018-09-11 12:50:00
4340.03.03.01.011.07.888.01016.527.828.822.7NaN1.7NaN2018-09-11 11:50:00
\n", "
" ], "text/plain": [ " wind_direction wind_speed wind_gust wave_height dominant_wave_period \\\n", "0 80.0 4.0 5.0 0.8 12.0 \n", "1 90.0 3.0 4.0 0.9 11.0 \n", "2 10.0 3.0 3.0 0.9 11.0 \n", "3 320.0 3.0 3.0 0.9 11.0 \n", "4 340.0 3.0 3.0 1.0 11.0 \n", "\n", " average_wave_period dominant_wave_direction pressure air_temperature \\\n", "0 9.1 100.0 1017.9 27.5 \n", "1 8.6 97.0 1017.9 27.6 \n", "2 8.1 101.0 1017.4 27.8 \n", "3 8.2 89.0 1017.0 27.8 \n", "4 7.8 88.0 1016.5 27.8 \n", "\n", " water_temperature dewpoint visibility 3hr_pressure_tendency \\\n", "0 29.0 22.0 NaN 0.9 \n", "1 28.9 21.9 NaN 1.4 \n", "2 28.8 24.3 NaN 1.8 \n", "3 28.8 22.9 NaN 2.0 \n", "4 28.8 22.7 NaN 1.7 \n", "\n", " water_level_above_mean time \n", "0 NaN 2018-09-11 15:50:00 \n", "1 NaN 2018-09-11 14:50:00 \n", "2 NaN 2018-09-11 13:50:00 \n", "3 NaN 2018-09-11 12:50:00 \n", "4 NaN 2018-09-11 11:50:00 " ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from siphon.simplewebservice.ndbc import NDBC\n", "df = NDBC.realtime_observations('41008')\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

Next, we need some model data for the same time period.

\n", "In this example, we are going to use the archived monthly buoy files:

\n", "NCEP_1806.tar.gz

" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "

MemberName:

\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "ncep_cols=['id','year','month','day','hour','u10','udir','Hs','Tp']\n" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "w3data=pandas.DataFrame()\n", "tar=tarfile.open('NCEP_1806.tar.gz')" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "for fcst in range(0,169,24):\n", " memberName='NCEP_'+str(year)[2:]+\"{:02n}\".format(month)+ \\\n", " '_'+\"{:03n}\".format(fcst)\n", " member=tar.getmember(memberName)\n", " f=tar.extractfile(member)\n", " frame=pandas.read_csv(f,names=ncep_cols,\n", " sep=' ',\n", " usecols=[1,3,4,5,6,7,8,9,10],\n", " skipinitialspace=True,\n", " index_col=False)\n", " frame['datetime']=pandas.to_datetime(frame[['year','month','day','hour']])\n", " frame=frame.drop(['year','month','day','hour'],axis=1)\n", " frame=frame.set_index('datetime')\n", " frame['fcst']=fcst\n", " w3data=w3data.append(frame,ignore_index=False)\n", "tar.close()\n" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idu10udirHsTpfcst
datetime
2018-06-01 00:00:00131308.4211.413.60
2018-06-01 06:00:00131308.2261.412.90
2018-06-01 12:00:00131308.3261.412.50
2018-06-01 18:00:00131308.5251.412.00
2018-06-02 00:00:00131309.7251.55.90
\n", "
" ], "text/plain": [ " id u10 udir Hs Tp fcst\n", "datetime \n", "2018-06-01 00:00:00 13130 8.4 21 1.4 13.6 0\n", "2018-06-01 06:00:00 13130 8.2 26 1.4 12.9 0\n", "2018-06-01 12:00:00 13130 8.3 26 1.4 12.5 0\n", "2018-06-01 18:00:00 13130 8.5 25 1.4 12.0 0\n", "2018-06-02 00:00:00 13130 9.7 25 1.5 5.9 0" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "w3data.head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

So now we have the NDBC data for buoy 41008 for June 2018, and the WW3 buoy data for all buoys for June 2018.

\n", "Our next step is to subset the w3data to just the data for buoy 41008.

" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "buoyID='41008'\n", "model=w3data[w3data.id==buoyID].copy()" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idu10udirHsTpfcst
datetime
2018-06-01 00:00:00410086.91780.97.40
2018-06-01 06:00:00410085.72270.97.40
2018-06-01 12:00:00410084.82770.77.30
2018-06-01 18:00:00410083.11820.67.30
2018-06-02 00:00:00410087.51900.77.20
\n", "
" ], "text/plain": [ " id u10 udir Hs Tp fcst\n", "datetime \n", "2018-06-01 00:00:00 41008 6.9 178 0.9 7.4 0\n", "2018-06-01 06:00:00 41008 5.7 227 0.9 7.4 0\n", "2018-06-01 12:00:00 41008 4.8 277 0.7 7.3 0\n", "2018-06-01 18:00:00 41008 3.1 182 0.6 7.3 0\n", "2018-06-02 00:00:00 41008 7.5 190 0.7 7.2 0" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.head()" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "array([ 0, 24, 48, 72, 96, 120, 144, 168])" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.fcst.unique()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "

If interested in a single forecast:

" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "m000=model[model.fcst==0]" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idu10udirHsTpfcst
datetime
2018-06-01 00:00:00410086.91780.97.40
2018-06-01 06:00:00410085.72270.97.40
2018-06-01 12:00:00410084.82770.77.30
2018-06-01 18:00:00410083.11820.67.30
2018-06-02 00:00:00410087.51900.77.20
\n", "
" ], "text/plain": [ " id u10 udir Hs Tp fcst\n", "datetime \n", "2018-06-01 00:00:00 41008 6.9 178 0.9 7.4 0\n", "2018-06-01 06:00:00 41008 5.7 227 0.9 7.4 0\n", "2018-06-01 12:00:00 41008 4.8 277 0.7 7.3 0\n", "2018-06-01 18:00:00 41008 3.1 182 0.6 7.3 0\n", "2018-06-02 00:00:00 41008 7.5 190 0.7 7.2 0" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m000.head()" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([0])" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m000.fcst.unique()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

Recall the NDBC data

" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
latlonudiru10HsTp
datetime
2018-06-01 01:00:0031.4-80.867996172.08.20.807.0
2018-06-01 02:00:0031.4-80.867996178.07.50.897.0
2018-06-01 03:00:0031.4-80.867996186.07.00.807.0
2018-06-01 04:00:0031.4-80.867996200.06.90.867.0
2018-06-01 05:00:0031.4-80.867996205.06.30.793.0
\n", "
" ], "text/plain": [ " lat lon udir u10 Hs Tp\n", "datetime \n", "2018-06-01 01:00:00 31.4 -80.867996 172.0 8.2 0.80 7.0\n", "2018-06-01 02:00:00 31.4 -80.867996 178.0 7.5 0.89 7.0\n", "2018-06-01 03:00:00 31.4 -80.867996 186.0 7.0 0.80 7.0\n", "2018-06-01 04:00:00 31.4 -80.867996 200.0 6.9 0.86 7.0\n", "2018-06-01 05:00:00 31.4 -80.867996 205.0 6.3 0.79 3.0" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.head()" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "buoy=data.copy()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

Now we can merge both the model and the NDBC data for the same buoy:

" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "both=pandas.merge(model,buoy,left_index=True,right_index=True, \\\n", " suffixes=('_m','_b'),how='inner')" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idu10_mudir_mHs_mTp_mfcstlatlonudir_bu10_bHs_bTp_b
datetime
2018-06-01 06:00:00410085.72270.97.4031.4-80.867996216.05.60.744.0
2018-06-01 06:00:00410086.12170.97.42431.4-80.867996216.05.60.744.0
2018-06-01 06:00:00410085.32280.97.44831.4-80.867996216.05.60.744.0
2018-06-01 06:00:00410086.02140.97.47231.4-80.867996216.05.60.744.0
2018-06-01 06:00:00410086.62080.87.59631.4-80.867996216.05.60.744.0
\n", "
" ], "text/plain": [ " id u10_m udir_m Hs_m Tp_m fcst lat lon \\\n", "datetime \n", "2018-06-01 06:00:00 41008 5.7 227 0.9 7.4 0 31.4 -80.867996 \n", "2018-06-01 06:00:00 41008 6.1 217 0.9 7.4 24 31.4 -80.867996 \n", "2018-06-01 06:00:00 41008 5.3 228 0.9 7.4 48 31.4 -80.867996 \n", "2018-06-01 06:00:00 41008 6.0 214 0.9 7.4 72 31.4 -80.867996 \n", "2018-06-01 06:00:00 41008 6.6 208 0.8 7.5 96 31.4 -80.867996 \n", "\n", " udir_b u10_b Hs_b Tp_b \n", "datetime \n", "2018-06-01 06:00:00 216.0 5.6 0.74 4.0 \n", "2018-06-01 06:00:00 216.0 5.6 0.74 4.0 \n", "2018-06-01 06:00:00 216.0 5.6 0.74 4.0 \n", "2018-06-01 06:00:00 216.0 5.6 0.74 4.0 \n", "2018-06-01 06:00:00 216.0 5.6 0.74 4.0 " ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "both.head()" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idu10_mudir_mHs_mTp_mfcstlatlonudir_bu10_bHs_bTp_b
datetime
2018-06-01 06:00:00410086.12170.97.42431.4-80.867996216.05.60.744.0
2018-06-01 12:00:00410085.02750.87.32431.4-80.867996267.05.70.737.0
2018-06-01 18:00:00410083.41640.67.32431.4-80.867996161.01.50.617.0
2018-06-02 00:00:00410087.71790.77.22431.4-80.867996174.04.40.537.0
2018-06-02 06:00:00410086.92330.87.22431.4-80.867996217.04.30.623.0
\n", "
" ], "text/plain": [ " id u10_m udir_m Hs_m Tp_m fcst lat lon \\\n", "datetime \n", "2018-06-01 06:00:00 41008 6.1 217 0.9 7.4 24 31.4 -80.867996 \n", "2018-06-01 12:00:00 41008 5.0 275 0.8 7.3 24 31.4 -80.867996 \n", "2018-06-01 18:00:00 41008 3.4 164 0.6 7.3 24 31.4 -80.867996 \n", "2018-06-02 00:00:00 41008 7.7 179 0.7 7.2 24 31.4 -80.867996 \n", "2018-06-02 06:00:00 41008 6.9 233 0.8 7.2 24 31.4 -80.867996 \n", "\n", " udir_b u10_b Hs_b Tp_b \n", "datetime \n", "2018-06-01 06:00:00 216.0 5.6 0.74 4.0 \n", "2018-06-01 12:00:00 267.0 5.7 0.73 7.0 \n", "2018-06-01 18:00:00 161.0 1.5 0.61 7.0 \n", "2018-06-02 00:00:00 174.0 4.4 0.53 7.0 \n", "2018-06-02 06:00:00 217.0 4.3 0.62 3.0 " ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "both024=both[both.fcst==24]\n", "both024.head()" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.figure(dpi=100)\n", "ax=plt.axes()\n", "both[both.fcst==0].plot(ax=ax,y=['Hs_b','Hs_m'])\n", "ax.grid(which='both')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

So we have both the model and the NDBC data for the buoy

\n", "\n", "in the pandas DataFrame \"both\"

" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "Index(['id', 'u10_m', 'udir_m', 'Hs_m', 'Tp_m', 'fcst', 'lat', 'lon', 'udir_b',\n", " 'u10_b', 'Hs_b', 'Tp_b'],\n", " dtype='object')" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "both.keys()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

Ready to calculate some basic stats...

\n", "First, see what methods the object has built-in

" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "['Hs_b',\n", " 'Hs_m',\n", " 'T',\n", " 'Tp_b',\n", " 'Tp_m',\n", " '_AXIS_ALIASES',\n", " '_AXIS_IALIASES',\n", " '_AXIS_LEN',\n", " '_AXIS_NAMES',\n", " '_AXIS_NUMBERS',\n", " '_AXIS_ORDERS',\n", " '_AXIS_REVERSED',\n", " '_AXIS_SLICEMAP',\n", " '__abs__',\n", " '__add__',\n", " '__and__',\n", " '__array__',\n", " '__array_wrap__',\n", " '__bool__',\n", " '__bytes__',\n", " '__class__',\n", " '__contains__',\n", " '__copy__',\n", " '__deepcopy__',\n", " '__delattr__',\n", " '__delitem__',\n", " '__dict__',\n", " '__dir__',\n", " '__div__',\n", " '__doc__',\n", " '__eq__',\n", " '__finalize__',\n", " '__floordiv__',\n", " '__format__',\n", " '__ge__',\n", " '__getattr__',\n", " '__getattribute__',\n", " '__getitem__',\n", " '__getstate__',\n", " '__gt__',\n", " '__hash__',\n", " '__iadd__',\n", " '__iand__',\n", " '__ifloordiv__',\n", " '__imod__',\n", " '__imul__',\n", " '__init__',\n", " '__init_subclass__',\n", " '__invert__',\n", " '__ior__',\n", " '__ipow__',\n", " '__isub__',\n", " '__iter__',\n", " '__itruediv__',\n", " '__ixor__',\n", " '__le__',\n", " '__len__',\n", " '__lt__',\n", " '__matmul__',\n", " '__mod__',\n", " '__module__',\n", " '__mul__',\n", " '__ne__',\n", " '__neg__',\n", " '__new__',\n", " '__nonzero__',\n", " '__or__',\n", " '__pos__',\n", " '__pow__',\n", " '__radd__',\n", " '__rand__',\n", " '__rdiv__',\n", " '__reduce__',\n", " '__reduce_ex__',\n", " '__repr__',\n", " '__rfloordiv__',\n", " '__rmatmul__',\n", " '__rmod__',\n", " '__rmul__',\n", " '__ror__',\n", " '__round__',\n", " '__rpow__',\n", " '__rsub__',\n", " '__rtruediv__',\n", " '__rxor__',\n", " '__setattr__',\n", " '__setitem__',\n", " '__setstate__',\n", " '__sizeof__',\n", " '__str__',\n", " '__sub__',\n", " '__subclasshook__',\n", " '__truediv__',\n", " '__unicode__',\n", " '__weakref__',\n", " '__xor__',\n", " '_accessors',\n", " '_add_numeric_operations',\n", " '_add_series_only_operations',\n", " '_add_series_or_dataframe_operations',\n", " '_agg_by_level',\n", " '_agg_doc',\n", " '_aggregate',\n", " '_aggregate_multiple_funcs',\n", " '_align_frame',\n", " '_align_series',\n", " '_box_col_values',\n", " '_box_item_values',\n", " '_builtin_table',\n", " '_check_inplace_setting',\n", " '_check_is_chained_assignment_possible',\n", " '_check_label_or_level_ambiguity',\n", " '_check_percentile',\n", " '_check_setitem_copy',\n", " '_clear_item_cache',\n", " '_clip_with_one_bound',\n", " '_clip_with_scalar',\n", " '_combine_const',\n", " '_combine_frame',\n", " '_combine_match_columns',\n", " '_combine_match_index',\n", " '_compare_frame',\n", " '_consolidate',\n", " '_consolidate_inplace',\n", " '_construct_axes_dict',\n", " '_construct_axes_dict_for_slice',\n", " '_construct_axes_dict_from',\n", " '_construct_axes_from_arguments',\n", " '_constructor',\n", " '_constructor_expanddim',\n", " '_constructor_sliced',\n", " '_convert',\n", " '_count_level',\n", " '_create_indexer',\n", " '_cython_table',\n", " '_deprecations',\n", " '_dir_additions',\n", " '_dir_deletions',\n", " '_drop_axis',\n", " '_drop_labels_or_levels',\n", " '_ensure_valid_index',\n", " '_expand_axes',\n", " '_find_valid_index',\n", " '_from_arrays',\n", " '_from_axes',\n", " '_get_agg_axis',\n", " '_get_axis',\n", " '_get_axis_name',\n", " '_get_axis_number',\n", " '_get_axis_resolvers',\n", " '_get_block_manager_axis',\n", " '_get_bool_data',\n", " '_get_cacher',\n", " '_get_index_resolvers',\n", " '_get_item_cache',\n", " '_get_label_or_level_values',\n", " '_get_numeric_data',\n", " '_get_value',\n", " '_get_values',\n", " '_getitem_array',\n", " '_getitem_column',\n", " '_getitem_frame',\n", " '_getitem_multilevel',\n", " '_getitem_slice',\n", " '_gotitem',\n", " '_iget_item_cache',\n", " '_indexed_same',\n", " '_info_axis',\n", " '_info_axis_name',\n", " '_info_axis_number',\n", " '_info_repr',\n", " '_init_dict',\n", " '_init_mgr',\n", " '_init_ndarray',\n", " '_internal_names',\n", " '_internal_names_set',\n", " '_is_builtin_func',\n", " '_is_cached',\n", " '_is_copy',\n", " '_is_cython_func',\n", " '_is_datelike_mixed_type',\n", " '_is_label_or_level_reference',\n", " '_is_label_reference',\n", " '_is_level_reference',\n", " '_is_mixed_type',\n", " '_is_numeric_mixed_type',\n", " '_is_view',\n", " '_ix',\n", " '_ixs',\n", " '_join_compat',\n", " '_maybe_cache_changed',\n", " '_maybe_update_cacher',\n", " '_metadata',\n", " '_needs_reindex_multi',\n", " '_obj_with_exclusions',\n", " '_protect_consolidate',\n", " '_reduce',\n", " '_reindex_axes',\n", " '_reindex_axis',\n", " '_reindex_columns',\n", " '_reindex_index',\n", " '_reindex_multi',\n", " '_reindex_with_indexers',\n", " '_repr_data_resource_',\n", " '_repr_fits_horizontal_',\n", " '_repr_fits_vertical_',\n", " '_repr_html_',\n", " '_repr_latex_',\n", " '_reset_cache',\n", " '_reset_cacher',\n", " '_sanitize_column',\n", " '_selected_obj',\n", " '_selection',\n", " '_selection_list',\n", " '_selection_name',\n", " '_series',\n", " '_set_as_cached',\n", " '_set_axis',\n", " '_set_axis_name',\n", " '_set_is_copy',\n", " '_set_item',\n", " '_set_value',\n", " '_setitem_array',\n", " '_setitem_frame',\n", " '_setitem_slice',\n", " '_setup_axes',\n", " '_shallow_copy',\n", " '_slice',\n", " '_stat_axis',\n", " '_stat_axis_name',\n", " '_stat_axis_number',\n", " '_take',\n", " '_to_dict_of_blocks',\n", " '_try_aggregate_string_function',\n", " '_typ',\n", " '_unpickle_frame_compat',\n", " '_unpickle_matrix_compat',\n", " '_update_inplace',\n", " '_validate_dtype',\n", " '_values',\n", " '_where',\n", " '_xs',\n", " 'abs',\n", " 'add',\n", " 'add_prefix',\n", " 'add_suffix',\n", " 'agg',\n", " 'aggregate',\n", " 'align',\n", " 'all',\n", " 'any',\n", " 'append',\n", " 'apply',\n", " 'applymap',\n", " 'as_matrix',\n", " 'asfreq',\n", " 'asof',\n", " 'assign',\n", " 'astype',\n", " 'at',\n", " 'at_time',\n", " 'axes',\n", " 'between_time',\n", " 'bfill',\n", " 'bool',\n", " 'boxplot',\n", " 'clip',\n", " 'clip_lower',\n", " 'clip_upper',\n", " 'columns',\n", " 'combine',\n", " 'combine_first',\n", " 'compound',\n", " 'copy',\n", " 'corr',\n", " 'corrwith',\n", " 'count',\n", " 'cov',\n", " 'cummax',\n", " 'cummin',\n", " 'cumprod',\n", " 'cumsum',\n", " 'describe',\n", " 'diff',\n", " 'div',\n", " 'divide',\n", " 'dot',\n", " 'drop',\n", " 'drop_duplicates',\n", " 'dropna',\n", " 'dtypes',\n", " 'duplicated',\n", " 'empty',\n", " 'eq',\n", " 'equals',\n", " 'eval',\n", " 'ewm',\n", " 'expanding',\n", " 'fcst',\n", " 'ffill',\n", " 'fillna',\n", " 'filter',\n", " 'first',\n", " 'first_valid_index',\n", " 'floordiv',\n", " 'from_dict',\n", " 'from_records',\n", " 'ftypes',\n", " 'ge',\n", " 'get',\n", " 'get_dtype_counts',\n", " 'get_ftype_counts',\n", " 'get_values',\n", " 'groupby',\n", " 'gt',\n", " 'head',\n", " 'hist',\n", " 'iat',\n", " 'id',\n", " 'idxmax',\n", " 'idxmin',\n", " 'iloc',\n", " 'index',\n", " 'infer_objects',\n", " 'info',\n", " 'insert',\n", " 'interpolate',\n", " 'isin',\n", " 'isna',\n", " 'isnull',\n", " 'items',\n", " 'iteritems',\n", " 'iterrows',\n", " 'itertuples',\n", " 'ix',\n", " 'join',\n", " 'keys',\n", " 'kurt',\n", " 'kurtosis',\n", " 'last',\n", " 'last_valid_index',\n", " 'lat',\n", " 'le',\n", " 'loc',\n", " 'lon',\n", " 'lookup',\n", " 'lt',\n", " 'mad',\n", " 'mask',\n", " 'max',\n", " 'mean',\n", " 'median',\n", " 'melt',\n", " 'memory_usage',\n", " 'merge',\n", " 'min',\n", " 'mod',\n", " 'mode',\n", " 'mul',\n", " 'multiply',\n", " 'ndim',\n", " 'ne',\n", " 'nlargest',\n", " 'notna',\n", " 'notnull',\n", " 'nsmallest',\n", " 'nunique',\n", " 'pct_change',\n", " 'pipe',\n", " 'pivot',\n", " 'pivot_table',\n", " 'plot',\n", " 'pop',\n", " 'pow',\n", " 'prod',\n", " 'product',\n", " 'quantile',\n", " 'query',\n", " 'radd',\n", " 'rank',\n", " 'rdiv',\n", " 'reindex',\n", " 'reindex_axis',\n", " 'reindex_like',\n", " 'rename',\n", " 'rename_axis',\n", " 'reorder_levels',\n", " 'replace',\n", " 'resample',\n", " 'reset_index',\n", " 'rfloordiv',\n", " 'rmod',\n", " 'rmul',\n", " 'rolling',\n", " 'round',\n", " 'rpow',\n", " 'rsub',\n", " 'rtruediv',\n", " 'sample',\n", " 'select',\n", " 'select_dtypes',\n", " 'sem',\n", " 'set_axis',\n", " 'set_index',\n", " 'shape',\n", " 'shift',\n", " 'size',\n", " 'skew',\n", " 'slice_shift',\n", " 'sort_index',\n", " 'sort_values',\n", " 'squeeze',\n", " 'stack',\n", " 'std',\n", " 'style',\n", " 'sub',\n", " 'subtract',\n", " 'sum',\n", " 'swapaxes',\n", " 'swaplevel',\n", " 'tail',\n", " 'take',\n", " 'to_clipboard',\n", " 'to_csv',\n", " 'to_dense',\n", " 'to_dict',\n", " 'to_excel',\n", " 'to_feather',\n", " 'to_gbq',\n", " 'to_hdf',\n", " 'to_html',\n", " 'to_json',\n", " 'to_latex',\n", " 'to_msgpack',\n", " 'to_panel',\n", " 'to_parquet',\n", " 'to_period',\n", " 'to_pickle',\n", " 'to_records',\n", " 'to_sparse',\n", " 'to_sql',\n", " 'to_stata',\n", " 'to_string',\n", " 'to_timestamp',\n", " 'to_xarray',\n", " 'transform',\n", " 'transpose',\n", " 'truediv',\n", " 'truncate',\n", " 'tshift',\n", " 'tz_convert',\n", " 'tz_localize',\n", " 'u10_b',\n", " 'u10_m',\n", " 'udir_b',\n", " 'udir_m',\n", " 'unstack',\n", " 'update',\n", " 'values',\n", " 'var',\n", " 'where',\n", " 'xs']" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dir(both)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hs_b, Hs_m, T, Tp_b, Tp_m, abs, add, add_prefix, add_suffix, agg, aggregate, align, all, any, append, apply, applymap, as_matrix, asfreq, asof, assign, astype, at, at_time, axes, between_time, bfill, bool, boxplot, clip, clip_lower, clip_upper, columns, combine, combine_first, compound, copy, corr, corrwith, count, cov, cummax, cummin, cumprod, cumsum, describe, diff, div, divide, dot, drop, drop_duplicates, dropna, dtypes, duplicated, empty, eq, equals, eval, ewm, expanding, fcst, ffill, fillna, filter, first, first_valid_index, floordiv, from_dict, from_records, ftypes, ge, get, get_dtype_counts, get_ftype_counts, get_values, groupby, gt, head, hist, iat, id, idxmax, idxmin, iloc, index, infer_objects, info, insert, interpolate, isin, isna, isnull, items, iteritems, iterrows, itertuples, ix, join, keys, kurt, kurtosis, last, last_valid_index, lat, le, loc, lon, lookup, lt, mad, mask, max, mean, median, melt, memory_usage, merge, min, mod, mode, mul, multiply, ndim, ne, nlargest, notna, notnull, nsmallest, nunique, pct_change, pipe, pivot, pivot_table, plot, pop, pow, prod, product, quantile, query, radd, rank, rdiv, reindex, reindex_axis, reindex_like, rename, rename_axis, reorder_levels, replace, resample, reset_index, rfloordiv, rmod, rmul, rolling, round, rpow, rsub, rtruediv, sample, select, select_dtypes, sem, set_axis, set_index, shape, shift, size, skew, slice_shift, sort_index, sort_values, squeeze, stack, std, style, sub, subtract, sum, swapaxes, swaplevel, tail, take, to_clipboard, to_csv, to_dense, to_dict, to_excel, to_feather, to_gbq, to_hdf, to_html, to_json, to_latex, to_msgpack, to_panel, to_parquet, to_period, to_pickle, to_records, to_sparse, to_sql, to_stata, to_string, to_timestamp, to_xarray, transform, transpose, truediv, truncate, tshift, tz_convert, tz_localize, u10_b, u10_m, udir_b, udir_m, unstack, update, values, var, where, xs\n" ] } ], "source": [ "methods=[x for x in dir(both) if not x.startswith('_')]\n", "print(', '.join(methods))" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
u10_mudir_mHs_mTp_mfcstlatlonudir_bu10_bHs_bTp_b
count116.000000116.000000116.000000116.000000116.0116.0116.000000116.000000116.000000114.000000114.000000
mean4.934483210.3793100.5465527.2948280.031.4-80.867996209.0344834.8758620.5620185.947368
std1.57392459.2331660.1379551.6705550.00.00.00000056.0277772.0015050.1440512.663560
min1.10000015.0000000.3000003.1000000.031.4-80.86799615.0000000.1000000.3100002.000000
25%3.700000170.7500000.5000007.0750000.031.4-80.867996181.0000003.5000000.4625004.000000
50%5.100000210.0000000.6000008.0000000.031.4-80.867996211.0000004.9500000.5350005.500000
75%6.125000260.2500000.6000008.4000000.031.4-80.867996244.5000006.2000000.6375008.000000
max8.700000336.0000000.9000008.9000000.031.4-80.867996356.00000010.0000000.96000014.000000
\n", "
" ], "text/plain": [ " u10_m udir_m Hs_m Tp_m fcst lat \\\n", "count 116.000000 116.000000 116.000000 116.000000 116.0 116.0 \n", "mean 4.934483 210.379310 0.546552 7.294828 0.0 31.4 \n", "std 1.573924 59.233166 0.137955 1.670555 0.0 0.0 \n", "min 1.100000 15.000000 0.300000 3.100000 0.0 31.4 \n", "25% 3.700000 170.750000 0.500000 7.075000 0.0 31.4 \n", "50% 5.100000 210.000000 0.600000 8.000000 0.0 31.4 \n", "75% 6.125000 260.250000 0.600000 8.400000 0.0 31.4 \n", "max 8.700000 336.000000 0.900000 8.900000 0.0 31.4 \n", "\n", " lon udir_b u10_b Hs_b Tp_b \n", "count 116.000000 116.000000 116.000000 114.000000 114.000000 \n", "mean -80.867996 209.034483 4.875862 0.562018 5.947368 \n", "std 0.000000 56.027777 2.001505 0.144051 2.663560 \n", "min -80.867996 15.000000 0.100000 0.310000 2.000000 \n", "25% -80.867996 181.000000 3.500000 0.462500 4.000000 \n", "50% -80.867996 211.000000 4.950000 0.535000 5.500000 \n", "75% -80.867996 244.500000 6.200000 0.637500 8.000000 \n", "max -80.867996 356.000000 10.000000 0.960000 14.000000 " ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "both[both.fcst==0].describe()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

Help is available for the objects bound methods

" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on method corr in module pandas.core.frame:\n", "\n", "corr(method='pearson', min_periods=1) method of pandas.core.frame.DataFrame instance\n", " Compute pairwise correlation of columns, excluding NA/null values\n", " \n", " Parameters\n", " ----------\n", " method : {'pearson', 'kendall', 'spearman'}\n", " * pearson : standard correlation coefficient\n", " * kendall : Kendall Tau correlation coefficient\n", " * spearman : Spearman rank correlation\n", " min_periods : int, optional\n", " Minimum number of observations required per pair of columns\n", " to have a valid result. Currently only available for pearson\n", " and spearman correlation\n", " \n", " Returns\n", " -------\n", " y : DataFrame\n", "\n" ] } ], "source": [ "help(both.corr)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

Suppose we want to look at the correlation coefficient
\n", "for one forecast grouped by day

" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "corr=both['Hs_m'][both.fcst==0].groupby(pandas.Grouper(freq='D')).corr(both['Hs_b'])" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "datetime\n", "2018-06-01 0.799368\n", "2018-06-02 0.617802\n", "2018-06-03 0.985839\n", "2018-06-04 0.846649\n", "2018-06-05 0.310881\n", "Freq: D, Name: Hs_m, dtype: float64" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corr.head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

Another way, is to select a parameter and forecast to validate

" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "model=both[both.fcst==0]['Hs_m'].copy()\n", "obs=both[both.fcst==0]['Hs_b'].copy()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "

To aggregate the data by day, we need to \"group\" the data
(think of looping through and collecting your data into chunks)

" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "obsgrp=obs.groupby(pandas.Grouper(freq='D'))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

Calculate some basic stats, grouped by day

" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[(Timestamp('2018-06-01 00:00:00', freq='D'), 3),\n", " (Timestamp('2018-06-02 00:00:00', freq='D'), 7),\n", " (Timestamp('2018-06-03 00:00:00', freq='D'), 11),\n", " (Timestamp('2018-06-04 00:00:00', freq='D'), 15),\n", " (Timestamp('2018-06-05 00:00:00', freq='D'), 19),\n", " (Timestamp('2018-06-06 00:00:00', freq='D'), 23),\n", " (Timestamp('2018-06-07 00:00:00', freq='D'), 27),\n", " (Timestamp('2018-06-08 00:00:00', freq='D'), 31),\n", " (Timestamp('2018-06-09 00:00:00', freq='D'), 35),\n", " (Timestamp('2018-06-10 00:00:00', freq='D'), 39)]" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "diff=model-obs\n", "diffgroup=diff.groupby(pandas.Grouper(freq='D'))\n", "[v for v in diffgroup.groups.items()][:10]" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "diff2group=(diff**2).groupby(pandas.Grouper(freq='D'))\n", "count=diffgroup.count()" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "datetime\n", "2018-06-01 0.0400\n", "2018-06-02 0.0925\n", "2018-06-03 -0.1425\n", "2018-06-04 -0.0300\n", "2018-06-05 -0.0675\n", "Freq: D, dtype: float64" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bias=diffgroup.mean()\n", "bias.head()" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "datetime\n", "2018-06-01 0.094163\n", "2018-06-02 0.115866\n", "2018-06-03 0.144827\n", "2018-06-04 0.050498\n", "2018-06-05 0.092060\n", "Freq: D, dtype: float64" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rmse=diff2group.mean()**0.5\n", "rmse.head()" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "datetime\n", "2018-06-01 13.350430\n", "2018-06-02 21.144840\n", "2018-06-03 16.770537\n", "2018-06-04 9.358023\n", "2018-06-05 20.958924\n", "Freq: D, dtype: float64" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scatter_index=100.*(rmse - bias**2)/obsgrp.mean()\n", "scatter_index.head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

Notice these are Series

" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "

It would be nice to have a DataFrame of the statistics

" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "fcst=0\n", "agg_stats=pandas.DataFrame({'Forecast':fcst,\n", " 'Bias':bias,\n", " 'RMSE':rmse,\n", " 'Corr':corr,\n", " 'Scatter_Index':scatter_index,\n", " 'Count':count})" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ForecastBiasRMSECorrScatter_IndexCount
datetime
2018-06-0100.04000.0941630.79936813.3504303
2018-06-0200.09250.1158660.61780221.1448404
2018-06-030-0.14250.1448270.98583916.7705374
2018-06-040-0.03000.0504980.8466499.3580234
2018-06-050-0.06750.0920600.31088120.9589244
\n", "
" ], "text/plain": [ " Forecast Bias RMSE Corr Scatter_Index Count\n", "datetime \n", "2018-06-01 0 0.0400 0.094163 0.799368 13.350430 3\n", "2018-06-02 0 0.0925 0.115866 0.617802 21.144840 4\n", "2018-06-03 0 -0.1425 0.144827 0.985839 16.770537 4\n", "2018-06-04 0 -0.0300 0.050498 0.846649 9.358023 4\n", "2018-06-05 0 -0.0675 0.092060 0.310881 20.958924 4" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "agg_stats.head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

Looks like something that might be nice to keep around?

" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "import sqlite3" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "dbfile='my_stats_table.db'\n", "conn = sqlite3.connect(dbfile,detect_types=sqlite3.PARSE_DECLTYPES)\n", "agg_stats.to_sql('STATS',conn,if_exists='append')\n", "conn.close()\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

To read it back in later

" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "conn = sqlite3.connect(dbfile,detect_types=sqlite3.PARSE_DECLTYPES)\n", "mystats=pandas.read_sql('select * from STATS',conn)\n", "conn.close()" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datetimeForecastBiasRMSECorrScatter_IndexCount
02018-06-0100.04000.0941630.79936813.3504303
12018-06-0200.09250.1158660.61780221.1448404
22018-06-030-0.14250.1448270.98583916.7705374
32018-06-040-0.03000.0504980.8466499.3580234
42018-06-050-0.06750.0920600.31088120.9589244
\n", "
" ], "text/plain": [ " datetime Forecast Bias RMSE Corr Scatter_Index Count\n", "0 2018-06-01 0 0.0400 0.094163 0.799368 13.350430 3\n", "1 2018-06-02 0 0.0925 0.115866 0.617802 21.144840 4\n", "2 2018-06-03 0 -0.1425 0.144827 0.985839 16.770537 4\n", "3 2018-06-04 0 -0.0300 0.050498 0.846649 9.358023 4\n", "4 2018-06-05 0 -0.0675 0.092060 0.310881 20.958924 4" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mystats.head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "
\n", "

Thank You!



\n", "

Any Questions?

\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }