KNIME Analytics Platform: Machine Learning Made Easy
In this article we will talk about an open-source software that is aimed at assisting data scientists and data science enthusiasts to solve complex problems with little or no coding knowledge at all. In this article, I will get you started with one such GUI based tool — KNIME Analytics Platform. Before we start if you are not familiar with basics of KNIME and how to download it , you can find here.
Data Science is abounding. It considers different realms of the data world including its preparation, cleaning, modeling, and whatnot. By end of this article, you will be able to predict NYC Airbnb Rental Prices without writing a piece of code! You can find description of dataset and also can download from here. Also you can import Workflows, Datasets, Nodes, Components and more from KNIME Hub.
How to setup a new project in KNIME
Before we delve more into how KNIME works, let’s define a few key terms to help us in our understanding and then see how to open up a new project in KNIME.
Node: A node is the basic processing point of any data manipulations. It can do a number of actions based on what you choose in your workflow.
Workflow: A workflow is the sequence of steps or actions you take in your platform to accomplish a particular task.
This is how your home screen at KNIME would look like.
The workflow coach on the left top corner will show you what percentage of the community of KNIME recommends a particular node for usage. The node repository will display all nodes that a particular workflow can have, depending on your needs. You can also go to “Browse Example Workflows” to check out more workflows once you have created your first one. This is the first step towards building a solution to any problem.
To setup a workflow, you can follow these steps.
Step 1: Go to File menu, and click on New and name it “ML_project”.Also you can specify destination of new workflow.
Step 2: Now when you click on Finish, you should have successfully created your first KNIME workflow.
This is your blank Workflow on KNIME. Now, you’re ready to explore and solve any problem by dragging any node from the repository to your workflow.
1. Importing the data files
Let us start with the first yet a very important step in understanding the problem; importing our data.
Drag and drop the “file reader” node to the workflow and double click on it. Next, browse the file you need to import into your workflow.
In this article, as we will Predicting Prices of NYC Airbnb Rental,Hence I will import the dataset of NYC Airbnb Rental.
This is what the preview would look like, once you import the dataset. click OK and execute the node.
2. How do you clean your Data?
The other things you can include in your approach before training your model are Data Cleaning and Feature Extraction.You can impute missing values using Missing value Node but KNIME also provide facility of Interactive Data Cleaning.
2.1 Finding Missing Values & Imputations
Before we impute values, we need to know which ones are missing.
Go to the node repository again, and find the node “Missing Values”. Drag and drop it, and connect the output of our File Reader to the node.
To impute values, select the node Missing Value and click configure. Select the appropriate imputations you want for your data depending on the type of data it is, and “Apply”.
Now when we execute it, our complete dataset with imputed values is ready in the output port of the node “Missing Value”. For my analysis, I have chosen the imputation methods as above.
You can choose from a variety of imputation techniques.
2.2 Interactive Data Cleaning
This KNIME component allows you to apply various data cleaning steps interactively. Default configuration will implement cleaning of missing values and outliers.You can directly drag and drop this component from KNIME hub.
Available pre-processing steps:
(I)Automatic type guessing: determine the most specific type in each string column and change the column types accordingly.
(II)Treatment of missing values: separate configurations for missing values in string and number columns.
(III)Outlier removal: configuration on how to treat outliers.
After importing this KNIME component your workflow execute it and right click on it and open interactive view.
Using this Interactive data cleaning you can perform different kind of task like Remove or Impute Missing values , changing Column name, Remove columns from dataset and also you can done Data exploration in interactive mode .
3. Machine Learning modeling in KNIME
Let us take a look at how we would build a machine learning model in KNIME. After data cleaning, pre-processing and feature engineering we know that for modeling, first we partition our data. Go to Node repository and import Partitioning node.
In the configure of Partitioning node, we have to specify the size of the first partition, click OK and execute the Node. You can choose from a variety of Partitioning techniques.
3.2 Implementing a Random Forest Model
Go to the Node repository again, and find the Node Random Forest Learner and Random Forest Predictor. Drag and drop it, and connect the output of our Partitioning Node as below .
In Random forest learner configuration, we have to specify our target column as Price, click OK and execute the Node.
In Random Forest Predictor configuration, you can change the Prediction column name, press OK and execute the Node to see the Prediction, right click on the Node and select prediction output.
3.3 Model Evaluation
Go to the Node repository and drag and drop Numeric Scorer and Also you can use different technique for model evaluation you can find different Node in Node repository.
By right clicking on Numeric Scorer and select Statistics you can find the values of R² and errors.This is the final workflow diagram that was obtained.
KNIME workflows are very handy when it comes to portability. They can be sent to your friends or colleagues to build on together, adding to the functionality of your product!
To export a KNIME workflow, you can simply click on File -> Export KNIME Workflow.After that, select the suitable workflow that you need to export and click finish! This will create a .knwf file that you can send across to anyone and they will be able to access it with one click!
KNIME is a platform that can be used for almost any kind of analysis. In this article, we explored how to import a dataset, Data cleaning and extract important features from it. Predictive modelling was undertaken as well, using a Random forest predictor to estimate Price and get the values of R² and error. Finally, we see that how we can share our work work with other.
Hope this tutorial has helped you uncover aspects of the KNIME Analytics Platform that you might have overlooked before. Will be posting more articles on KNIME and Data Science in future.
Just try this out, and ping me if you have any queries:
you can DM me on Linkedin
Till then Happy KNIMING !!