-by Ankit Kansal and Nayan Naik
1)Creating Hive tables and browsing through them
hive> CREATE TABLE pokes (foo INT, bar STRING); the command creates a table called pokes with two columns, the first being an integer and the other a string
hive> CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);
this command creates a table called invites with two columns and a partition column called ds. The partition column is a virtual column. It is not part of the data itself but is derived from the partition that a particular dataset is loaded into.
By default, tables are assumed to be of text input format and the delimiters are assumed to be ^A(ctrl-a).
hive> SHOW TABLES; lists all the tables
hive> DESCRIBE invites; shows the list of columns
As for altering tables, table names can be changed and additional columns can be dropped:
hive> ALTER TABLE pokes ADD COLUMNS (new_col INT);
hive> ALTER TABLE invites ADD COLUMNS (new_col2 INT COMMENT ‘a comment’);
hive> ALTER TABLE events RENAME TO 3koobecaf;
1)Loading data from flat files into Hive:
hive> LOAD DATA LOCAL INPATH ‘./examples/files/kv1.txt’ OVERWRITE INTO TABLE pokes;
Loads a file that contains two columns separated by ctrl-a into pokes table.’local’ signifies that the input file is on the local file system. If ‘local’ is omitted then it looks for the file in HDFS.
-NO verification of data against the schema is performed by the load command.
-If the file is in hdfs, it is moved into the Hive-controlled file system namespace.
The root of the Hive directory is specified by the option hive.metastore.warehouse.dir in hive-default.xml. We advise users to create this directory before trying to create tables via Hive.
hive> LOAD DATA LOCAL INPATH ‘./examples/files/kv2.txt’ OVERWRITE INTO TABLE invites PARTITION (ds=’2008-08-15′);
hive> LOAD DATA LOCAL INPATH ‘./examples/files/kv3.txt’ OVERWRITE INTO TABLE invites PARTITION (ds=’2008-08-08′);
The two LOAD statements above load data into two different partitions of the table invites. Table invites must be created as partitioned by the key ds for this to succeed.
hive> LOAD DATA INPATH ‘/user/myname/kv2.txt’ OVERWRITE INTO TABLE invites PARTITION (ds=’2008-08-15′);The above command will load data from an HDFS file/directory to the table.
Note that loading data from HDFS will result in moving the file/directory. As a result, the operation is almost instantaneous.
To connect hive using jdbc clients, we need to run hive servers and connect to them from your client at
Note that you will need to run a hiveserver per client since accessing the hive server from multiple clients may land you up into concurrency issues. Although hiveserver is multithreaded, it is not recommended to connect to it through multiple clients. Hive servers connect to metastore, so you will also need to have the metastore service running.
You may refer this link on how to write a JDBC Client – https://cwiki.apache.org/confluence/display/Hive/HiveClient#HiveClient-JDBCClientSampleCode
Are you connecting to hive using the hive CLI? In that case, you may run a single or multiple metastores and all of the hive clients can connect to the metastore and run queries happily.
If you have a mysql metastore configured,all you need to do is to set the parameter javax.jdo.option.ConnectionURL in in your hive-site.xml to the jdbc url of the existing mysql server and metastore database and you should be able to connect to the existing store.
Make sure that in your hive-site.xml, the parameter hive.metastore.local Is set to false. javax.jdo.option.ConnectionUserName and javax.jdo.option.ConnectionPassword is set to username and password on the mysql server you are connecting to and all privileges on the database you are using as your metastore is granted to this user. Last but not the least, make sure you change hive.metastore.uris to point to the mysql host.
You don’t need to run sql script for mysql. Follow these simple steps to setup your mysql metastore
Create file ‘hive-site.xml under hive conf directory.Set the following configuration parameters in that file
<description>controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM</description>
<description>JDBC connect string for a JDBC metastore</description>
<description>Driver class name for a JDBC metastore</description>
<description>username to use against metastore database</description>
<value><fill in with password></value>
<description>password to use against metastore database</description>
If you want to a specific port for thrift uris, you will need to create a file called “hive-env.sh” in the conf folder and add this to that file (assuming you want to expose thrift uris on port 9090)
Once you have these configurations set up, all you need to do is start the hive service using command.
hive –service metastore
This service runs in the foreground. You may use nohup to run metastore in the background. You should be good to go.