Tuesday, September 6, 2016

Installing Python libraries to work with scikit-learn


              I was trying to install scikit-learn and it had the following requirements /dependencies
            
              Python (>= 2.6 or >= 3.3),
                  NumPy (>= 1.6.1),  & SciPy (>= 0.9).

             As per the website, http://scikit-learn.org/stable/install.html they have suggested
             not to install it using yum.  Only option was using pip , but I have run into
             trouble using pip to install  numpy.  The moment I tried to upgrade numpy
             using pip it was  downgrading numpy to 1.1.1
  
                     After googling for a while I found a solution for this problem by using 
                        pip install -I numpy==1.6.1
                        pip install -I scipy==0.9.0

             Next install 'Development Tools' package using yum
                yum group install 'Development Tools' -y
             Last  install scikit-learn using
                   pip install scikit-learn  


 

Thursday, February 18, 2016

Changing the Database used by Oozie


      


Recently I was working with a customer when we noticed that the Derby Database 
used by oozie was corrupted . The option was to reinstall Derby. As this was a production system, we   have asked them to move to install the oozie schema on the existing MySQL Database,
which is also used by hive


Following are the steps to do that:

1. Remove the Oozie  using the REST API's first stop this  service using

curl -u admin:admin -H "X-Requested-By: ambari" -X PUT -d '{"RequestInfo":{"context":"Stop Service"},"Body":{"ServiceInfo":{"state":"INSTALLD"}}}' http://biginsights40.ibm.com:8080/api/v1/clusters/BI41/services/OOZIE


2. To remove the service run

[root@bdsup006 ~]# curl -u admin:admin -H "X-Requested-By: ambari" -X DELETE http://biginsights40.ibm.com:8080/api/v1/clusters/BI41/services/OOZIE

** substitute biginsights40.ibm.com with the Ambari Server name (FQDN) and
BI41 – with the cluster name 


3. Go back to Ambari web console, logout and log in again, verify that Oozie Service is removed.

4. To add the Oozie Servie back with MySQL Server as the database, use the following steps 
   

     a. Click Actions Button at the bottom of the Ambari screen and select Add Services

 
   b.Select Oozie Service, to be added, select the master node and click to install oozie 'client' 
      on all the nodes. Click Next. 

  c. In the next page, when prompted to select the database, select existing MySQL Database 
     as the preferred database for Oozie. You see the following message, once MySQL is selected.
     
        

     Run the following on the machine where ambari-server service is running
      ambari-server setup --jdbc-db=mysql –jdbc-driver=/usr/share/java/mysql-connector-java.jar

 d. Now we need to create the schema that is required for Oozie
  

    
   ssh to the machine where MySQL Server is installed , connect to the database and r
   run the following commands

    [root@biginsights40 ~]# mysql
    Welcome to the MySQL monitor. Commands end with ; or \g.
    Your MySQL connection id is 16
    Server version: 5.1.73 Source distribution

    Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
   Oracle is a registered trademark of Oracle Corporation and/or its
   affiliates. Other names may be trademarks of their respective owners.

    Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

    mysql> create database oozie;
    mysql> CREATE USER 'oozie'@'%' IDENTIFIED BY 'oozie';
    mysql> CREATE USER 'oozie'@'localhost' IDENTIFIED BY 'oozie';
    mysql> CREATE USER 'oozie'@'biginsights40.ibm.com' IDENTIFIED BY 'oozie';
    mysql> GRANT ALL PRIVILEGES ON *.* TO 'oozie'@'%';
    mysql> GRANT ALL PRIVILEGES ON *.* TO 'oozie'@'localhost';
    mysql> GRANT ALL PRIVILEGES ON *.* TO 'oozie'@'biginsights40.ibm.com';

    Replace biginsights40.ibm.com with the machine name from your cluster where MySQL is setup.

    Now  click on the Test Connection to verify that you are able to connect to newly
    created oozie database. 
   

  
    Click Next Button and complete the component install.
    It is possible at the end, you might see an incomplete status for the component configuration.

   This is expected, and needs to be addressed. Click OK and continue forward to 
    complete the installation.

   Login back to Ambari web console , select MAPREDUCE Component STOP 
   and START the service. This is the reason why incomplete status is shown in the previous step.

   After this the cluster should be working normally.