Java Weka API: Adding List To Instances Object

This is just a quick one to save anyone else new to the Weka api in Java spending as much time as i did figuring this one out.

Lets suppose you have a weka instances object and a new list of values you want to add into it as a new attribute (weka slang for a new column in your data).

Below gist shows a small reproducible example of this.

import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.Add;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
public class wekaDev {
public static void main(String[] args) throws Exception {
// read in data
DataSource source = new DataSource("C:\\Program Files\\Weka-3-8\\data\\iris.arff");
Instances data = source.getDataSet();
// look at data before
System.out.println("\n====== BEFORE ======\n");
printHead(data,5);
// make a list of random doubles
List<Double> list = new ArrayList<>(data.numInstances());
Random rand = new Random();
for (int i=0;i<=data.numInstances();i++) {
list.add(rand.nextDouble());
}
// get list of numbers into Instances object as new attribute
Add filter = new Add();
filter.setAttributeIndex("last"); // add as last so we know where it is
filter.setAttributeName("randomDouble"); // give it a name
filter.setInputFormat(data); // set the filter
data = Filter.useFilter(data, filter); // use the filter
// now update all the values
for (int i = 0; i < data.numInstances(); i++) {
data.instance(i).setValue(data.numAttributes()-1, list.get(i)); // need to -1 as zero based index
}
// look at data after
System.out.println("\n====== AFTER ======\n");
printHead(data,5);
}
public static void printHead(Instances data, int n) {
int numCols = data.numAttributes();
for (int i=0; i<=(numCols-1); i++){
if (i < (numCols-1)) {
System.out.print(data.attribute(i).name() + ",");
}
else {
System.out.print(data.attribute(i).name() + "\n");
}
}
for (int i=0; i<=(n-1); i++){
System.out.println(data.instance(i));
}
}
}

And you should see some output like this:

====== BEFORE ======
sepallength,sepalwidth,petallength,petalwidth,class
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5,3.6,1.4,0.2,Iris-setosa

====== AFTER ======
sepallength,sepalwidth,petallength,petalwidth,class,randomDouble
5.1,3.5,1.4,0.2,Iris-setosa,0.173003
4.9,3,1.4,0.2,Iris-setosa,0.304703
4.7,3.2,1.3,0.2,Iris-setosa,0.925626
4.6,3.1,1.5,0.2,Iris-setosa,0.733839
5,3.6,1.4,0.2,Iris-setosa,0.710073

Above we can see the new attribute “randomDouble” has been added to the instances object.

I’m learning the weka java api at the moment so will try post little tidbits like this that could be useful for others of even myself 6 months from now 🙂

Leave a Reply