{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Web scraping for PcDeMaNo\n",
    "To get values from websites which don't provide an API is often only through scraping. It can be very tricky to get to the right values but this example here should help you to get started. This is very similar to the work-flow the [`scrape` sensor](https://home-assistant.io/components/sensor.scrape/) is using."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Get the value"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Importing the needed modules."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "from bs4 import BeautifulSoup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# POLUCION"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [],
   "source": [
    "URL = 'http://gestiona.madrid.org/azul_internet/html/web/DatosEstacionAccion.icm?ESTADO_MENU=2&idEstacion=12'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With `requests` the website is retrieved and with `BeautifulSoup` parsed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [],
   "source": [
    "raw_html = requests.get(URL).text\n",
    "data = BeautifulSoup(raw_html, 'html.parser')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now you have the complete content of the page. [CSS selectors](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors) can be used to identify the counter. We have several options to get the part in question. As `BeautifulSoup` is giving us a list with the findings, we only need to identify the position in the list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1 : <td width=\"89\">\n",
      "<img alt=\"Comunidad de Madrid - madrid.org\" height=\"64\" src=\"/webutils/logoCM-izq-89x64.png\" title=\"Comunidad de Madrid - madrid.org\" width=\"89\"/>\n",
      "</td>\n",
      "2 : <td class=\"nihil\" width=\"15\">\r\n",
      "       \r\n",
      "    </td>\n",
      "3 : <td>\n",
      "<table border=\"0\" cellpadding=\"0\" cellspacing=\"0\" width=\"100%\">\n",
      "<tr>\n",
      "<td class=\"txt08gr3\" colspan=\"2\">\r\n",
      "            D.G. del Medio Ambiente | Consejería de Medio Ambiente, Administración Local y Ordenación del Territorio\r\n",
      "          </td>\n",
      "</tr>\n",
      "<tr>\n",
      "<td class=\"titGeo11azu\">\r\n",
      "            Área de Calidad Atmosférica - Red de Calidad del Aire\r\n",
      "          </td>\n",
      "<td id=\"tdalerta\">\n",
      "</td>\n",
      "</tr>\n",
      "<tr>\n",
      "<td class=\"puntos\" colspan=\"2\">\n",
      "</td>\n",
      "</tr>\n",
      "<tr>\n",
      "<td colspan=\"2\" width=\"100%\">\n",
      "<a href=\"AvisosAccion.icm?ESTADO_MENU=1\" id=\"navegacion\">Inicio</a>\n",
      "<span id=\"navegacion\"> &gt; </span>\n",
      "<span id=\"navegacion\">Datos de la Red</span>\n",
      "</td>\n",
      "</tr>\n",
      "</table>\n",
      "</td>\n",
      "4 : <td class=\"txt08gr3\" colspan=\"2\">\r\n",
      "            D.G. del Medio Ambiente | Consejería de Medio Ambiente, Administración Local y Ordenación del Territorio\r\n",
      "          </td>\n",
      "5 : <td class=\"titGeo11azu\">\r\n",
      "            Área de Calidad Atmosférica - Red de Calidad del Aire\r\n",
      "          </td>\n",
      "6 : <td id=\"tdalerta\">\n",
      "</td>\n",
      "7 : <td class=\"puntos\" colspan=\"2\">\n",
      "</td>\n",
      "8 : <td colspan=\"2\" width=\"100%\">\n",
      "<a href=\"AvisosAccion.icm?ESTADO_MENU=1\" id=\"navegacion\">Inicio</a>\n",
      "<span id=\"navegacion\"> &gt; </span>\n",
      "<span id=\"navegacion\">Datos de la Red</span>\n",
      "</td>\n",
      "9 : <td width=\"89\">\n",
      "<a href=\"http://www.madrid.org/transparencia\" target=\"_blank\">\n",
      "<img alt=\"Portal de transparencia\" height=\"64\" src=\"../../images/portal/transparencia.jpg\" title=\"Portal de transparencia\" width=\"89\"/>\n",
      "</a>\n",
      "</td>\n",
      "10 : <td class=\"linea\" colspan=\"5\"> </td>\n",
      "11 : <td class=\"nihil\" colspan=\"3\" height=\"12\"> </td>\n",
      "12 : <td class=\"txt11neg\" colspan=\"2\">\n",
      "<strong>\n",
      "        Estación de Majadahonda\n",
      "      </strong>\n",
      "</td>\n",
      "13 : <td align=\"center\" class=\"txt11bla\" colspan=\"2\" id=\"fondoGrisMedio\">\n",
      "<strong>\n",
      "        Ultima media horaria a las\n",
      "        13:00\n",
      "      </strong>\n",
      "</td>\n",
      "14 : <td valign=\"top\">\n",
      "<table align=\"center\" valign=\"top\" width=\"100%\">\n",
      "<tr>\n",
      "<td class=\"txt11azu\" colspan=\"2\" id=\"fondoGris\">Contaminantes          </td>\n",
      "</tr>\n",
      "<tr>\n",
      "<td id=\"fondoVainilla\">\n",
      "              TIN\n",
      "              <small>\n",
      "                (\n",
      "                ºC\n",
      "                )\n",
      "              </small>\n",
      "</td>\n",
      "<td align=\"right\" bgcolor=\"\">\n",
      "              17.2\n",
      "              \n",
      "            </td>\n",
      "</tr>\n",
      "<tr>\n",
      "<td id=\"fondoBeige\">\n",
      "              NO\n",
      "              <small>\n",
      "                (\n",
      "                µg/m<sup>3</sup>\n",
      "                )\n",
      "              </small>\n",
      "</td>\n",
      "<td align=\"right\" bgcolor=\"\">\n",
      "              7\n",
      "              \n",
      "            </td>\n",
      "</tr>\n",
      "<tr>\n",
      "<td id=\"fondoVainilla\">\n",
      "              NO2\n",
      "              <small>\n",
      "                (\n",
      "                µg/m<sup>3</sup>\n",
      "                )\n",
      "              </small>\n",
      "</td>\n",
      "<td align=\"right\" bgcolor=\"\">\n",
      "              7\n",
      "              \n",
      "            </td>\n",
      "</tr>\n",
      "<tr>\n",
      "<td id=\"fondoBeige\">\n",
      "              PM10\n",
      "              <small>\n",
      "                (\n",
      "                µg/m<sup>3</sup>\n",
      "                )\n",
      "              </small>\n",
      "</td>\n",
      "<td align=\"right\" bgcolor=\"\">\n",
      "              4\n",
      "              \n",
      "            </td>\n",
      "</tr>\n",
      "<tr>\n",
      "<td id=\"fondoVainilla\">\n",
      "              NOX\n",
      "              <small>\n",
      "                (\n",
      "                µg/m<sup>3</sup>\n",
      "                )\n",
      "              </small>\n",
      "</td>\n",
      "<td align=\"right\" bgcolor=\"\">\n",
      "              ***\n",
      "              N\n",
      "            </td>\n",
      "</tr>\n",
      "<tr>\n",
      "<td id=\"fondoBeige\">\n",
      "              O3\n",
      "              <small>\n",
      "                (\n",
      "                µg/m<sup>3</sup>\n",
      "                )\n",
      "              </small>\n",
      "</td>\n",
      "<td align=\"right\" bgcolor=\"\">\n",
      "              75\n",
      "              \n",
      "            </td>\n",
      "</tr>\n",
      "</table>\n",
      "</td>\n",
      "15 : <td class=\"txt11azu\" colspan=\"2\" id=\"fondoGris\">Contaminantes          </td>\n",
      "16 : <td id=\"fondoVainilla\">\n",
      "              TIN\n",
      "              <small>\n",
      "                (\n",
      "                ºC\n",
      "                )\n",
      "              </small>\n",
      "</td>\n",
      "17 : <td align=\"right\" bgcolor=\"\">\n",
      "              17.2\n",
      "              \n",
      "            </td>\n",
      "18 : <td id=\"fondoBeige\">\n",
      "              NO\n",
      "              <small>\n",
      "                (\n",
      "                µg/m<sup>3</sup>\n",
      "                )\n",
      "              </small>\n",
      "</td>\n",
      "19 : <td align=\"right\" bgcolor=\"\">\n",
      "              7\n",
      "              \n",
      "            </td>\n",
      "20 : <td id=\"fondoVainilla\">\n",
      "              NO2\n",
      "              <small>\n",
      "                (\n",
      "                µg/m<sup>3</sup>\n",
      "                )\n",
      "              </small>\n",
      "</td>\n",
      "21 : <td align=\"right\" bgcolor=\"\">\n",
      "              7\n",
      "              \n",
      "            </td>\n",
      "22 : <td id=\"fondoBeige\">\n",
      "              PM10\n",
      "              <small>\n",
      "                (\n",
      "                µg/m<sup>3</sup>\n",
      "                )\n",
      "              </small>\n",
      "</td>\n",
      "23 : <td align=\"right\" bgcolor=\"\">\n",
      "              4\n",
      "              \n",
      "            </td>\n",
      "24 : <td id=\"fondoVainilla\">\n",
      "              NOX\n",
      "              <small>\n",
      "                (\n",
      "                µg/m<sup>3</sup>\n",
      "                )\n",
      "              </small>\n",
      "</td>\n",
      "25 : <td align=\"right\" bgcolor=\"\">\n",
      "              ***\n",
      "              N\n",
      "            </td>\n",
      "26 : <td id=\"fondoBeige\">\n",
      "              O3\n",
      "              <small>\n",
      "                (\n",
      "                µg/m<sup>3</sup>\n",
      "                )\n",
      "              </small>\n",
      "</td>\n",
      "27 : <td align=\"right\" bgcolor=\"\">\n",
      "              75\n",
      "              \n",
      "            </td>\n",
      "28 : <td valign=\"top\">\n",
      "<table align=\"center\" width=\"100%\">\n",
      "<tr>\n",
      "<td class=\"txt11azu\" colspan=\"2\" id=\"fondoGris\">Meteorología          </td>\n",
      "</tr>\n",
      "<tr>\n",
      "<td id=\"fondoBeige\">\n",
      "              VV\n",
      "              <small>\n",
      "                (\n",
      "                m/s\n",
      "                )\n",
      "              </small>\n",
      "</td>\n",
      "<td align=\"right\">\n",
      "              4.3\n",
      "              \n",
      "            </td>\n",
      "</tr>\n",
      "<tr>\n",
      "<td id=\"fondoVainilla\">\n",
      "              DV\n",
      "              <small>\n",
      "                (\n",
      "                Grd\n",
      "                )\n",
      "              </small>\n",
      "</td>\n",
      "<td align=\"right\">\n",
      "              71\n",
      "              \n",
      "            </td>\n",
      "</tr>\n",
      "<tr>\n",
      "<td id=\"fondoBeige\">\n",
      "              Tmp\n",
      "              <small>\n",
      "                (\n",
      "                ºC\n",
      "                )\n",
      "              </small>\n",
      "</td>\n",
      "<td align=\"right\">\n",
      "              14.7\n",
      "              \n",
      "            </td>\n",
      "</tr>\n",
      "<tr>\n",
      "<td id=\"fondoVainilla\">\n",
      "              HR\n",
      "              <small>\n",
      "                (\n",
      "                %\n",
      "                )\n",
      "              </small>\n",
      "</td>\n",
      "<td align=\"right\">\n",
      "              61\n",
      "              \n",
      "            </td>\n",
      "</tr>\n",
      "<tr>\n",
      "<td id=\"fondoBeige\">\n",
      "              Pre\n",
      "              <small>\n",
      "                (\n",
      "                mbar\n",
      "                )\n",
      "              </small>\n",
      "</td>\n",
      "<td align=\"right\">\n",
      "              938\n",
      "              \n",
      "            </td>\n",
      "</tr>\n",
      "<tr>\n",
      "<td id=\"fondoVainilla\">\n",
      "              RS\n",
      "              <small>\n",
      "                (\n",
      "                W/m<sup>2</sup>\n",
      "                )\n",
      "              </small>\n",
      "</td>\n",
      "<td align=\"right\">\n",
      "              864\n",
      "              \n",
      "            </td>\n",
      "</tr>\n",
      "<tr>\n",
      "<td id=\"fondoBeige\">\n",
      "              Llu\n",
      "              <small>\n",
      "                (\n",
      "                l/m<sup>2</sup>\n",
      "                )\n",
      "              </small>\n",
      "</td>\n",
      "<td align=\"right\">\n",
      "              0.0\n",
      "              \n",
      "            </td>\n",
      "</tr>\n",
      "</table>\n",
      "</td>\n",
      "29 : <td class=\"txt11azu\" colspan=\"2\" id=\"fondoGris\">Meteorología          </td>\n",
      "30 : <td id=\"fondoBeige\">\n",
      "              VV\n",
      "              <small>\n",
      "                (\n",
      "                m/s\n",
      "                )\n",
      "              </small>\n",
      "</td>\n",
      "31 : <td align=\"right\">\n",
      "              4.3\n",
      "              \n",
      "            </td>\n",
      "32 : <td id=\"fondoVainilla\">\n",
      "              DV\n",
      "              <small>\n",
      "                (\n",
      "                Grd\n",
      "                )\n",
      "              </small>\n",
      "</td>\n",
      "33 : <td align=\"right\">\n",
      "              71\n",
      "              \n",
      "            </td>\n",
      "34 : <td id=\"fondoBeige\">\n",
      "              Tmp\n",
      "              <small>\n",
      "                (\n",
      "                ºC\n",
      "                )\n",
      "              </small>\n",
      "</td>\n",
      "35 : <td align=\"right\">\n",
      "              14.7\n",
      "              \n",
      "            </td>\n",
      "36 : <td id=\"fondoVainilla\">\n",
      "              HR\n",
      "              <small>\n",
      "                (\n",
      "                %\n",
      "                )\n",
      "              </small>\n",
      "</td>\n",
      "37 : <td align=\"right\">\n",
      "              61\n",
      "              \n",
      "            </td>\n",
      "38 : <td id=\"fondoBeige\">\n",
      "              Pre\n",
      "              <small>\n",
      "                (\n",
      "                mbar\n",
      "                )\n",
      "              </small>\n",
      "</td>\n",
      "39 : <td align=\"right\">\n",
      "              938\n",
      "              \n",
      "            </td>\n",
      "40 : <td id=\"fondoVainilla\">\n",
      "              RS\n",
      "              <small>\n",
      "                (\n",
      "                W/m<sup>2</sup>\n",
      "                )\n",
      "              </small>\n",
      "</td>\n",
      "41 : <td align=\"right\">\n",
      "              864\n",
      "              \n",
      "            </td>\n",
      "42 : <td id=\"fondoBeige\">\n",
      "              Llu\n",
      "              <small>\n",
      "                (\n",
      "                l/m<sup>2</sup>\n",
      "                )\n",
      "              </small>\n",
      "</td>\n",
      "43 : <td align=\"right\">\n",
      "              0.0\n",
      "              \n",
      "            </td>\n",
      "44 : <td class=\"nihil\" height=\"12\"> </td>\n",
      "45 : <td class=\"linea2\"></td>\n",
      "46 : <td class=\"nihil\" height=\"4\"> </td>\n",
      "47 : <td>\n",
      "<table border=\"0\" cellpadding=\"0\" cellspacing=\"0\" width=\"100%\">\n",
      "<tr><td height=\"37\" width=\"72\"><a href=\"http://www.w3.org/WAI/WCAG1A-Conformance\" target=\"_blank\" title=\"Accesibilidad\">\n",
      "<img alt=\"Icono de conformidad con el Nivel A, de las Directrices de Accesibilidad para el Contenido Web 1.0 del W3C-WAI\" class=\"wai\" height=\"20\" src=\"../../images/portal/wai_w3c.gif\" title=\"Icono de conformidad con el Nivel A, de las Directrices de Accesibilidad para el Contenido Web 1.0 del W3C-WAI\" width=\"57\"/></a></td>\n",
      "<td class=\"linea\" width=\"1\"></td>\n",
      "<td class=\"copy\" width=\"195\"><span class=\"txt06gr6\">Copyright © Comunidad de Madrid.</span></td>\n",
      "<td class=\"linea\" width=\"1\"></td>\n",
      "<td class=\"enlaces\">\n",
      "<a class=\"txt07gr6\" href=\"/cs/Satellite?pagename=ComunidadMadrid/Comunes/Presentacion/popUp&amp;language=es&amp;c=CM_Texto_FA&amp;cid=1109265462487\" onclick=\"openWin('/cs/Satellite?pagename=ComunidadMadrid/Comunes/Presentacion/popUp&amp;language=es&amp;c=CM_Texto_FA&amp;cid=1109265462487',696,520,100,100);return false;\" title=\"Aviso Legal\">Aviso Legal</a> <span class=\"txt07gr6\">|</span>\n",
      "<a class=\"txt07gr6\" href=\"/cs/Satellite?pagename=ComunidadMadrid/Comunes/Presentacion/popUp&amp;language=es&amp;c=CM_Texto_FA&amp;cid=1109265462508\" onclick=\"openWin('/cs/Satellite?pagename=ComunidadMadrid/Comunes/Presentacion/popUp&amp;language=es&amp;c=CM_Texto_FA&amp;cid=1109265462508',696,520,100,100);return false;\" title=\"Privacidad\">Privacidad</a> <span class=\"txt07gr6\">|</span>\n",
      "<a class=\"txt07gr6\" href=\"/cs/Satellite?pagename=ComunidadMadrid/Comunes/Presentacion/popUp&amp;language=es&amp;c=CM_Texto_FA&amp;cid=1109265462527\" onclick=\"openWin('/cs/Satellite?pagename=ComunidadMadrid/Comunes/Presentacion/popUp&amp;language=es&amp;c=CM_Texto_FA&amp;cid=1109265462527',696,520,100,100);return false;\" title=\"Contacto\">Contacto</a> <span class=\"txt07gr6\">|</span>\n",
      "<a class=\"txt07gr6\" href=\"/cs/Satellite?pagename=ComunidadMadrid/Comunes/Presentacion/popUp&amp;language=es&amp;c=CM_Texto_FA&amp;cid=1109266097450\" onclick=\"openWin('/cs/Satellite?pagename=ComunidadMadrid/Comunes/Presentacion/popUp&amp;language=es&amp;c=CM_Texto_FA&amp;cid=1109266097450',696,520,100,100);return false;\" title=\"Accesibilidad\">Accesibilidad</a>\n",
      "</td>\n",
      "</tr></table>\n",
      "</td>\n",
      "48 : <td height=\"37\" width=\"72\"><a href=\"http://www.w3.org/WAI/WCAG1A-Conformance\" target=\"_blank\" title=\"Accesibilidad\">\n",
      "<img alt=\"Icono de conformidad con el Nivel A, de las Directrices de Accesibilidad para el Contenido Web 1.0 del W3C-WAI\" class=\"wai\" height=\"20\" src=\"../../images/portal/wai_w3c.gif\" title=\"Icono de conformidad con el Nivel A, de las Directrices de Accesibilidad para el Contenido Web 1.0 del W3C-WAI\" width=\"57\"/></a></td>\n",
      "49 : <td class=\"linea\" width=\"1\"></td>\n",
      "50 : <td class=\"copy\" width=\"195\"><span class=\"txt06gr6\">Copyright © Comunidad de Madrid.</span></td>\n"
     ]
    }
   ],
   "source": [
    "for i in range (50):\n",
    "  print(i+1,\":\", data.select('td')[i])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`nth-of-type(x)` gives you element `x` back."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# POLENES"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To make your selector as robust as possible, it's recommended to look for unique elements like `id`, `URL`, etc."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [],
   "source": [
    "URL = 'http://www.madrid.org/cs/Satellite?cid=1265185300196&language=es&pagename=PortalSalud%2FPage%2FPTSA_pintarContenidoFinal&vest=1265185299945'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With `requests` the website is retrieved and with `BeautifulSoup` parsed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [],
   "source": [
    "raw_html = requests.get(URL).text\n",
    "data = BeautifulSoup(raw_html, 'html.parser')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now you have the complete content of the page. [CSS selectors](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors) can be used to identify the counter. We have several options to get the part in question. As `BeautifulSoup` is giving us a list with the findings, we only need to identify the position in the list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1 : <span>\n",
      "Consejería de Sanidad\n",
      "</span>\n",
      "2 : <span>Estás en</span>\n",
      "3 : <span style=\"color:blue\">Datos del día 12 de abril</span>\n",
      "4 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%\">Niveles </span>\n",
      "5 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: green; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-color: green; mso-style-textfill-fill-alpha: 100.0%\">MEDIOS</span>\n",
      "6 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%\"> de polen de </span>\n",
      "7 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: green; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-color: green; mso-style-textfill-fill-alpha: 100.0%\">Plátano</span>\n",
      "8 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: #ff9900; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-color: #FF9900; mso-style-textfill-fill-alpha: 100.0%\"> </span>\n",
      "9 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%\">con </span>\n",
      "10 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: green; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-color: green; mso-style-textfill-fill-alpha: 100.0%\">163</span>\n",
      "11 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%\"> </span>\n",
      "12 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%\">granos de polen por metro cúbico de aire</span>\n",
      "13 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%\">con un máximo de </span>\n",
      "14 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: #ff9900; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-color: #FF9900; mso-style-textfill-fill-alpha: 100.0%\">676</span>\n",
      "15 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: #ff9900; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-color: #FF9900; mso-style-textfill-fill-alpha: 100.0%\"> </span>\n",
      "16 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: #ff9900; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-color: #FF9900; mso-style-textfill-fill-alpha: 100.0%\">granos en Alcalá de Henares</span>\n",
      "17 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%\">con un mínimo de </span>\n",
      "18 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%\">1 </span>\n",
      "19 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%\">granos de polen en </span>\n",
      "20 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%\">Las Rozas</span>\n",
      "21 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%\">Niveles BAJOS de polen de Gramíneas con </span>\n",
      "22 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%\">2 </span>\n",
      "23 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%\">granos de polen por metro cúbico<br>\n",
      "con un máximo de 5 granos en </br></span>\n",
      "24 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%\">Getafe</span>\n",
      "25 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%\">con un mínimo de </span>\n",
      "26 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%\">0 </span>\n",
      "27 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%\">granos de polen en Leganés</span>\n",
      "28 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%\">Niveles NULOS de polen de </span>\n",
      "29 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%\">Plantago</span>\n",
      "30 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%\"> con 0 granos de polen por metro cúbico</span>\n",
      "31 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%\"><span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%\"><span style='font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: \"Times New Roman\"; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%'>Las escalas para cada tipo de polen atienden únicamente a criterios </span><span style='font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: \"Times New Roman\"; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%'>aerobiológicos</span></span></span>\n",
      "32 : <span style=\"font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: Arial; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%\"><span style='font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: \"Times New Roman\"; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%'>Las escalas para cada tipo de polen atienden únicamente a criterios </span><span style='font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: \"Times New Roman\"; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%'>aerobiológicos</span></span>\n",
      "33 : <span style='font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: \"Times New Roman\"; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%'>Las escalas para cada tipo de polen atienden únicamente a criterios </span>\n",
      "34 : <span style='font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: \"Times New Roman\"; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%'>aerobiológicos</span>\n",
      "35 : <span style='font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: \"Times New Roman\"; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%'><span style='font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: \"Times New Roman\"; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%'>Los </span><span style='font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: \"Times New Roman\"; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%'>niveles de concentración se expresan como granos de polen por metro cúbico de aire y corresponden a los datos de concentración medios para toda la Red </span><span style='font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: \"Times New Roman\"; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%'>Palinocam</span></span>\n",
      "36 : <span style='font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: \"Times New Roman\"; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%'>Los </span>\n",
      "37 : <span style='font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: \"Times New Roman\"; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%'>niveles de concentración se expresan como granos de polen por metro cúbico de aire y corresponden a los datos de concentración medios para toda la Red </span>\n",
      "38 : <span style='font-size: 12pt; font-family: Arial; font-weight: bold; color: #1f497d; language: es; mso-ascii-font-family: Arial; mso-fareast-font-family: +mn-ea; mso-bidi-font-family: \"Times New Roman\"; mso-fareast-theme-font: minor-fareast; mso-color-index: 3; mso-font-kerning: 12.0pt; mso-style-textfill-type: solid; mso-style-textfill-fill-themecolor: text2; mso-style-textfill-fill-color: #1F497D; mso-style-textfill-fill-alpha: 100.0%'>Palinocam</span>\n",
      "39 : <span><span>MAPA POLEN</span></span>\n",
      "40 : <span>MAPA POLEN</span>\n"
     ]
    }
   ],
   "source": [
    "for i in range (40):\n",
    "  print(i+1,\":\", data.select('span')[i])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The value extration is handled with `value_template` by the [`scrape` sensor](https://home-assistant.io/components/sensor.scrape/). The next two step are only shown here to show all manual steps.\n",
    "\n",
    "We only need the actual text."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is a string and can be manipulated. We focus on the number."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is the number of the current platforms/components from the [Component overview](https://home-assistant.io/components/) which are available in Home Assistant.\n",
    "\n",
    "The details you identified here can be re-used to configure [`scrape` sensor](https://home-assistant.io/components/sensor.scrape/)'s `select`. This means that the most efficient way is to apply `nth-of-type(x)` to your selector."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}