Share this blog!

In this post, I will be going through the installations and requirements needed to run the benchmark model presented for DengAI hosted on DrivenData.

What is DrivenData?

Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. DrivenData is an organization that hosts online data science competitions where the data and problem are posed by a socially-minded organization. You get to put your analytic skills to the test in order to tackle real-world problems with real-world impact. DengAI is a competition hosted by DrivenData that allows competitors to predict the local epidemics of the dengue disease. The blog on this link, presents a benchmark model provided by DrivenData that can be used to compare the competitors results with.

Selecting the correct wheel file

The benchmark is modeled using Python (obviously) and Jupyter Notebook. Therefore the first thing you must have is python, in your machine. I am running this on a Windows machine therefore I had to install some of the required python plugins from the wheel files on gohlke packages.

When selecting the required packages you have to be careful to pick the one that is compatible with your machine. For example, when I run python in my cmd, I get the following:

Therefore when I have to get numpy, I will be selecting "numpy‑1.12.1+mkl‑cp35‑cp35m‑win32.whl" from the list as follows:


Installing the packages

The benchmark requires a handful of libraries, which could be easily installed using pip using the command line as follows. IT's not a problem if some of them are already installed.

1. Upgrade pip
python -m pip install --upgrade pip


2. Install Numpy
  • Download the correct wheel file from numpy on gohlke 
  • Open command line in the downloaded folder
  • Run pip install "numpy wheel filename" (Eg: pip install "numpy‑1.12.1+mkl‑cp35‑cp35m‑win32.whl")

3. Install pandas

pip install "pandas wheel filename"


4. Install SciPy

pip install "scipy wheel filename"


5. Install Statsmodels

pip install "statsmodels wheel filename"


6. Install Scikit-learn

pip install -U scikit-learn


7. Install Seaborn

pip install seaborn


8. Install Matplotlib

pip install matplotlib


9. Install Jupyter

pip install jupyter


10. Run Jupyter

jupyter notebook

When you run this code, the jupyter server will be started which will be used to run our model. It will present a token which will be asked for, when running the notebook.

Create and run a notebook

I am using Jetbrains PyCharm to run this model, which allows users to easily manage python scripts. In PyCharm, create a new project and in that, create a new Jupyter Notebook.

Once the notebook is created, you will be presented with the editor to manage the cells. Copy the In[1] in the benchmark post and past in a new cell. Then click on the run button to run the cell. You might be prompted to enter the token from the jupyter as said in #10 above.


If the first cell is successfully run, it indicates that your library installations are successful. Keep adding the other cells and run the benchmark.

Points to remember


  • It is always better to run each cell individually - which is obviously why jupyter notebooks are useful for.
  • If filepaths are troublesome, try using absolute paths.
  • If there are exceptions and/or errors where the fix does not seem to work, try restarting the jupyter server.


The following sample shows an example code where JSch is used to create an SSH session and connect to a remote Database.

This sample Java project was created using maven and the mysql and JSch dependencies in the pom.xml are as follows:

<!-- https://mvnrepository.com/artifact/mysql/mysql-connector-java -->
<dependency>
    <groupId>mysql</groupId>
    <artifactId>mysql-connector-java</artifactId>
    <version>5.1.6</version>
</dependency>

<!-- https://mvnrepository.com/artifact/com.jcraft/jsch -->
<dependency>
    <groupId>com.jcraft</groupId>
    <artifactId>jsch</artifactId>
    <version>0.1.54</version>
</dependency>

The connectSession() method will be creating a session and returning it (so it can be disconnected later).
The main method is simply accessing the DB using a typical database connection.

package com.sachi;

import com.jcraft.jsch.JSch;
import com.jcraft.jsch.Session;
import java.sql.*;

/**
 * Created by Sachi on 4/1/2017.
 */
public class JschTrial {
    static int lport;
    static String rhost;
    static int rport;

    public static Session connectSession() {
        Session session = null;
        String user = "mySSHusername";       //ssh username
        String password = "mySSHpassword";   //ssh password
        String host = "myHOSTorIP";          //ssh host/IP
        int port = 22;
        try {
            JSch jsch = new JSch();
            session = jsch.getSession(user, host, port);
            lport = 4321;
            rhost = "localhost";
            rport = 3306;
            session.setPassword(password);
            session.setConfig("StrictHostKeyChecking", "no");
            System.out.println("Establishing Connection...");
            session.connect();

            int assinged_port = session.setPortForwardingL(lport, rhost, rport);
            System.out.println("localhost:" + assinged_port + " -> " + rhost + ":" + rport);
        } catch (Exception e) {
            System.err.print(e);
        }
        return session;
    }

    public static void main(String[] args) {
        Session ses = null;
        try {
            ses = connectSession();
        } catch (Exception ex) {
            ex.printStackTrace();
        }
        System.out.println("An example for accessing remote db through ssh!");
        Connection con = null;
        String driver = "com.mysql.jdbc.Driver";
        String url = "jdbc:mysql://" + rhost + ":" + lport + "/";

        String db = "dabname";                //db name
        String dbUser = "root";               //db username
        String dbPasswd = "dbrootpassword";   //db userpassword
        try {
            Class.forName(driver);
            con = DriverManager.getConnection(url + db, dbUser, dbPasswd);
            try {
                String sql = "SELECT name FROM mytable";
                PreparedStatement ps = con.prepareStatement(sql);
                ResultSet rset = ps.executeQuery();
                while (rset.next()) {
                    System.out.println("Id : " + rset.getString("name"));
                }
            } catch (SQLException s) {
                System.out.println("SQL statement is not executed!");
            }
            con.close();
            ses.disconnect();

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Next PostNewer Posts Previous PostOlder Posts Home