Forecasting Bike Share Rental Demand with Automated Machine Learning

In this comprehensive tutorial, you’ll learn how to create a time-series forecasting model for bike share rental demand using Azure Machine Learning’s automated machine learning (AutoML) capabilities. The step-by-step guide takes you through the process without requiring any coding, making it accessible for non-coding data scientists.

Prerequisites

To follow along, you’ll need:

  1. An Azure Machine Learning workspace. You can create one by following the Create workspace resources guide.
  2. The bike-no.csv data file, which contains the bike share rental data.

Sign in to Azure Machine Learning Studio

  1. Sign in to the Azure Machine Learning studio.
  2. Select your subscription and the workspace you created.
  3. Click Get started to access the studio.
  4. In the left pane, select Automated ML under the Author section.
  5. Click +New automated ML job to start a new experiment.

Create and Load Dataset

Before configuring your experiment, you’ll need to upload the bike share data as an Azure Machine Learning dataset.

  1. On the Select dataset form, choose From local files from the +Create dataset dropdown.
  2. On the Basic info form, provide a name and description for your dataset. Ensure the dataset type is set to Tabular.
  3. On the Datastore and file selection form, select the default datastore workspaceblobstore (Azure Blob Storage) and upload the bike-no.csv file.
  4. Verify the Settings and preview form is populated correctly and select Next.
  5. On the Schema form, choose to ignore the casual and registered columns, as they are a breakdown of the cnt column you want to predict.
  6. Confirm the dataset details and click Create to complete the dataset creation.
  7. Select your newly created dataset and click Next.

Configure the Experiment

  1. On the Configure job form, enter the experiment name automl-bikeshare and select the cnt column as the target to predict.
  2. Select compute cluster as your compute type and click +New to configure your compute target.
  3. In the Select virtual machine form, choose Dedicated for the virtual machine tier, CPU (Central Processing Unit) for the virtual machine type, and Standard_DS12_V2 for the virtual machine size.
  4. In the Configure settings form, provide a unique name for your compute context (e.g., bike-compute) and set the minimum and maximum nodes to 1 and 6, respectively.
  5. Click Create to provision the compute target, then select it from the dropdown.
  6. Click Next to proceed.

Specify Forecasting Settings

  1. On the Task type and settings form, select Time series forecasting as the machine learning task type.
  2. Choose date as the Time column and leave Time series identifiers blank.
  3. Keep the Frequency set to Autodetect.
  4. Deselect Autodetect for the forecast horizon and set it to 14.
  5. Click View additional configuration settings and populate the fields as follows:
    • Primary metric: Normalized root mean squared error
    • Explain best model: Enable
    • Blocked algorithms: Extreme Random Trees
    • Forecast target lags: None
    • Target rolling window size: None
    • Training job time (hours): 3
    • Max concurrent iterations: 6
  6. Click Save and then Next.
  7. On the [Optional] Validate and test form, select k-fold cross-validation as the Validation type and set the Number of cross validations to 5.

Run the Experiment

Click Finish to start the automated ML experiment. The job will take 10-15 minutes to prepare, and each model iteration will take an additional 2-3 minutes.

Explore the Models

While the experiment is running, you can navigate to the Models tab to explore the algorithms (models) as they complete. Select the Algorithm name of a completed model to view its performance details, including the Overview and Metrics tabs.

Deploy the Best Model

Once the experiment is complete, the best model based on the Normalized root mean squared error metric will be displayed in the Best model summary section. To deploy this model as a web service:

  1. Select the best model to open the model-specific page.
  2. Click the Deploy button in the top-left area of the screen.
  3. In the Deploy a model pane, provide a name (e.g., bikeshare-deploy) and description for the deployment, select Azure Compute Instance (ACI) as the compute type, and leave the other settings as the defaults.
  4. Click Deploy to start the deployment process, which will take approximately 20 minutes.

Next Steps

After the deployment is successful, you can learn how to consume the web service and test your predictions using Power BI’s built-in Azure Machine Learning support by following the Consume a web service guide.

Additionally, you can explore more about automated machine learning and understanding automated machine learning results.

Source: Azure Machine Learning Studio Tutorial - Demand forecasting & AutoML