DSPy a system for AI to Write Prompts and Do Fine Tuning
Clouldera Implementation Guide for Production Deployments
1. Clouldera Implementation Guide for
Production Deployments
In this article i will cover a detailed step by step guide for installing Cloudera CDH 5.14
using Cloudera Manager and External Database Setup and create a Hadoop Cluster. This
is the recommended path for all production deployments.
The standard Cloudera installation guide was kinda confusing for me, it keep looping
between different URLs that is hard to have a clear path for the implementation with even
some steps that do not work in the explained order as well some that needs to be with
different syntax.
Here i am sharing a clear and easy path to follow with references, please feel free to reach
me for any clarifications or any suggestions for improvements :)
Contacts:
Name: Ahmed Mekawy
Email: ahmedmekawy@hotmail.com
LinkedIn: https://www.linkedin.com/in/ahmed-mekawy-1ba11031/
Please feel free to reach me when you do have a need to setup a production environment
or administration training classes and I will be happy to help. Let's get started:
Implementation Overview:
Install and configure the database, install the Oracle JDK
– Database should be external for production deployments ( this what we will do here)
– Embedded PostgreSQL is okay for testing or ‘proof of concept’ work
Ensure access to the Cloudera software repositories
– For Cloudera Manager
– For CDH
2. Install Cloudera Manager and agents
Install the CDH Parcel services or RPMs for the services required on each host in the
cluster
Implementation Environment Planning:
I am using VirtualBox to create a VM with Centos 7, my hostname is cloudera.
The VM is 5G RAM , 15 GB Disk Space ,with 1 Network Card and Internet access.
I will use MySQL as the external database for Cloudera Manager and CDH components.
For different setup, you only need to ensure having the right ceritified matrix and
capacity planing in place, the rest of the steps are exactly the same as this guide, review
the following links:
Please review CDH 5 and Cloudera Manager 5 Requirements and Supported Versions .
Hardware Requirements Guide
Building local repositories for hosts with no internet access.
Implementation step by step:
login as: root
root@192.168.1.50's password:
Disable Firewall:
[root@cloudera ~]# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2018-03-05 09:07:48 EST; 1min 10s ago
[root@cloudera ~]# service firewalld stop
Redirecting to /bin/systemctl stop firewalld.service
6. Transaction Summary
================================================================================
Install 1 Package
Total download size: 135 M
Installed size: 279 M
Is this ok [y/d/N]: y
Downloading packages:
Installed:
oracle-j2sdk1.7.x86_64 0:1.7.0+update67-1
Complete!
Install Cloudera Manager Components:
[root@cloudera yum.repos.d]# yum install cloudera-manager-daemons cloudera-manager-server
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: mirror.airenetworks.es
* extras: mirror.crazynetwork.it
* updates: mirrors.prometeus.net
Resolving Dependencies
--> Running transaction check
Dependencies Resolved
================================================================================
Package Arch Version Repository Size
7. ================================================================================
Installing:
cloudera-manager-daemons
x86_64 5.14.1-1.cm5141.p0.1.el7 cloudera-manager 700 M
cloudera-manager-server x86_64 5.14.1-1.cm5141.p0.1.el7 cloudera-manager 8.5 k
Transaction Summary
================================================================================
Install 2 Packages (+27 Dependent packages)
Total size: 711 M
Total download size: 700 M
Installed size: 918 M
Is this ok [y/d/N]: y
Downloading packages:
Delta RPMs disabled because /usr/bin/applydeltarpm not installed.
cloudera-manager-daemons-5.14.1-1.cm5141.p0.1.el7.x86_64.r | 700 MB 33:36
Installed:
cloudera-manager-daemons.x86_64 0:5.14.1-1.cm5141.p0.1.el7
cloudera-manager-server.x86_64 0:5.14.1-1.cm5141.p0.1.el7
Complete!
[root@cloudera yum.repos.d]#
Installing mysql database:
8. https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_ig_mysql.html#cmig_topic_5_5
[root@cloudera yum.repos.d]# yum install mysql-server
No package mysql-server available.
Error: Nothing to do
[root@cloudera yum.repos.d]#
Mysql is not in the default repo fro Centos 7 , the right approach is to download the mysql community
package which will update the needed repo file
[root@cloudera yum.repos.d]# wget https://repo.mysql.com//mysql57-community-release-el7-
11.noarch.rpm
100%[======================================>] 25,680 --.-K/s in 0.08s
2018-03-05 13:26:51 (302 KB/s) - ‘mysql57-community-release-el7-11.noarch.rpm’ saved [25680/25680]
[root@cloudera yum.repos.d]# rpm -ivh mysql57-community-release-el7-11.noarch.rpm
warning: mysql57-community-release-el7-11.noarch.rpm: Header V3 DSA/SHA1 Signature, key ID
5072e1f5: NOKEY
Preparing... ################################# [100%]
Updating / installing...
1:mysql57-community-release-el7-11 ################################# [100%]
[root@cloudera yum.repos.d]# ls
CentOS-Base.repo CentOS-Media.repo mysql-community.repo
CentOS-CR.repo CentOS-Sources.repo mysql-community-source.repo
CentOS-Debuginfo.repo CentOS-Vault.repo
CentOS-fasttrack.repo cloudera-manager.repo
[root@cloudera yum.repos.d]# df -k .
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/centos-root 14616576 2260784 12355792 16% /
10. Active: active (running) since Mon 2018-03-05 14:09:00 EST; 29s ago
Docs: man:mysqld(8)
http://dev.mysql.com/doc/refman/en/using-systemd.html
Retrieving mysql auto generated password:
[root@cloudera mysql]# grep 'temporary password' /var/log/mysqld.log
2018-03-05T19:08:56.327113Z 1 [Note] A temporary password is generated for root@localhost:
HFauGGUl=6Fh
Removing password validation plugin:
[root@cloudera mysql]# mysql -uroot -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or g.
Your MySQL connection id is 2
Server version: 5.7.21
mysql> uninstall plugin validate_password;
ERROR 1820 (HY000): You must reset your password using ALTER USER statement before executing this
statement.
mysql> alter user root@localhost IDENTIFIED BY ABCxyz$123456' ;
Query OK, 0 rows affected (0.00 sec)
mysql> uninstall plugin validate_password;
Query OK, 0 rows affected (0.01 sec)
mysql>
[root@cloudera mysql]# /usr/bin/mysql_secure_installation
Securing the MySQL server deployment.
Enter password for user root:
11. VALIDATE PASSWORD PLUGIN can be used to test passwords
and improve security. It checks the strength of password
and allows the users to set only those passwords which are
secure enough. Would you like to setup VALIDATE PASSWORD plugin?
Press y|Y for Yes, any other key for No: No
Using existing password for root.
Change the password for root ? ((Press y|Y for Yes, any other key for No) : y
New password:
Re-enter new password:
By default, a MySQL installation has an anonymous user,
allowing anyone to log into MySQL without having to have
a user account created for them. This is intended only for
testing, and to make the installation go a bit smoother.
You should remove them before moving into a production
environment.
Remove anonymous users? (Press y|Y for Yes, any other key for No) : Y
Success.
Normally, root should only be allowed to connect from 'localhost'. This ensures that someone cannot
guess at the root password from the network.
Disallow root login remotely? (Press y|Y for Yes, any other key for No) : N
... skipping.
12. By default, MySQL comes with a database named 'test' that anyone can access. This is also intended
only for testing, and should be removed before moving into a production environment.
Remove test database and access to it? (Press y|Y for Yes, any other key for No) : Y
- Dropping test database...
Success.
- Removing privileges on test database...
Success.
Reloading the privilege tables will ensure that all changes made so far will take effect immediately.
Reload privilege tables now? (Press y|Y for Yes, any other key for No) : Y
Success.
All done!
[root@cloudera mysql]#
Download and install the MySQL JDBC client driver:
[root@cloudera backup]# wget https://cdn.mysql.com//Downloads/Connector-J/mysql-connector-java-
5.1.45.tar.gz
2018-03-05 14:24:02 (104 KB/s) - ‘mysql-connector-java-5.1.45.tar.gz’ saved [3467861/3467861]
[root@cloudera backup]# ls
mysql-connector-java-5.1.45.tar.gz
[root@cloudera backup]# ls
mysql-connector-java-5.1.45 mysql-connector-java-5.1.45.tar.gz
13. [root@cloudera backup]# cp mysql-connector-java-5.1.45/mysql-connector-java-5.1.45-bin.jar
/usr/share/java/mysql-connector-java.jar
cp: cannot create regular file ‘/usr/share/java/mysql-connector-java.jar’: No such file or directory
[root@cloudera backup]# mkdir -p /usr/share/java/
[root@cloudera backup]# cp mysql-connector-java-5.1.45/mysql-connector-java-5.1.45-bin.jar
/usr/share/java/mysql-connector-java.jar
[root@cloudera backup]#
Tidy the mysql with moving the ib_logfiles and create needed database:
[root@cloudera backup]# systemctl stop mysqld
[root@cloudera backup]# mv /var/lib/mysql/ib_logfile0 /backup
[root@cloudera backup]# mv /var/lib/mysql/ib_logfile1 /backup
[root@cloudera etc]# mysql -uroot -p
Enter password:
mysql> create database rman DEFAULT CHARACTER SET utf8;
Query OK, 1 row affected (0.00 sec)
mysql> grant all on rman.* TO 'rman'@'localhost' IDENTIFIED BY 'password';
Query OK, 0 rows affected, 1 warning (0.00 sec)
Configure cloudera manager to use the mysql as its external database:
[root@cloudera etc]# /usr/share/cmf/schema/scm_prepare_database.sh mysql -h localhost -uroot -
pwelcome1 --scm-host localhost scm scm scm
JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera
Verifying that we can write to /etc/cloudera-scm-server
14. Mon Mar 05 14:46:56 EST 2018 WARN: Establishing SSL connection without server's identity verification
is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection
must be established by default if explicit option isn't set. For compliance with existing applications not
using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by
setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Creating SCM configuration file in /etc/cloudera-scm-server
Executing: /usr/java/jdk1.7.0_67-cloudera/bin/java -cp /usr/share/java/mysql-connector-
java.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/cmf/schema/../lib/*
com.cloudera.enterprise.dbutil.DbCommandExecutor /etc/cloudera-scm-server/db.properties
com.cloudera.cmf.db.
Mon Mar 05 14:46:58 EST 2018 WARN: Establishing SSL connection without server's identity verification
is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection
must be established by default if explicit option isn't set. For compliance with existing applications not
using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by
setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
[ main] DbCommandExecutor INFO Successfully connected to database.
All done, your SCM database is configured correctly!
Start Cloudera manager server:
[root@cloudera ~]# service cloudera-scm-server start
[root@cloudera ~]#
tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log
2018-03-05 14:58:45,006 INFO SearchRepositoryManager-
0:com.cloudera.server.web.cmf.search.components.SearchRepositoryManager: Finished constructing
repo:2018-03-05T19:58:45.006Z
2018-03-05 14:58:45,767 INFO WebServerImpl:org.mortbay.log: jetty-6.1.26.cloudera.4
2018-03-05 14:58:45,768 INFO WebServerImpl:org.mortbay.log: Started
SelectChannelConnector@0.0.0.0:7180
2018-03-05 14:58:45,768 INFO WebServerImpl:com.cloudera.server.cmf.WebServerImpl: Started Jetty
server.
Installation has been completed successfully
Now start web browser with the VM IP address and port 7180 to start agents’ deployment and CDH
cluster setup.