Friday, June 24, 2016

Hadoop Links

hadoop-filesystem-commands


https://hadoopquiz.blogspot.in/2016/05/hadoop-filesystem-commands.html


http://www.crackinghadoop.com/create-a-hive-table-in-hue/

Monday, June 20, 2016

Exception Handling interview questions

What is Exception in Java ?

An exception is an event, which occurs during the execution of a program, that disrupts the normal flow of the program's instructions. When an error occurs within a method, the method creates an object and hands it off to the runtime system. The object, called an exception object, contains information about the error, including its type and the state of the program when the error occurred. Creating an exception object and handing it to the runtime system is called throwing an exception. 



What happens when a method throws an exception ?

After a method throws an exception, the runtime system attempts to find an exception handler to handle it. The search begins with the method in which the error occurred and proceeds through the call stack in the reverse order in which the methods were called. When an appropriate handler is found, the runtime system passes the exception to the handler. If the runtime system is unable to find an appropriate exception handler,the runtime system terminates.

For example : 
Suppose the main method calls a method named A . A calls B and B calls a method C . C method throws exception . The problem can be described with




the below diagram .



We can understand this in the steps given below :

  1. Exception occurs at method C() . The method creates an exception object and gives it to the runtime system. 
  2. The runtime system starts looking for a matching exception handler . The search starts from method C() but because it does not provide any exception handler , the call propogates down the call stack. 
  3. The runtime system looks for a handler in method B() . B also does not provide any handler so the call propagates down the call stack. 
  4. A() provides a try/catch block that catches the type of exception that was thrown by C() so runtime system passes the exception to this handler.

What is OutOfMemoryError in Java?

OutOfMemoryError in Java is a subclass of java.lang.VirtualMachineError and it’s thrown by JVM when it ran out of heap memory. We can fix this error by providing more memory to run the java application through java options.
$>java MyProgram -Xms1024m -Xmx1024m -XX:PermSize=64M -XX:MaxPermSize=256m


Can we have an empty catch block?


We can have an empty catch block but it’s the example of worst programming. We should never have empty catch block because if the exception is caught by that block, we will have no information about the exception and it wil be a nightmare to debug it. There should be at least a logging statement to log the exception details in console or log files.

Can we write only try block without catch and finally blocks?
No, It shows compilation error. The try block must be followed by either catch or finally block. You can remove either catch block or finally block but not both.

What is Re-throwing an exception in java?
Exceptions raised in the try block are handled in the catch block. If it is unable to handle that exception, it can re-throw that exception using throw keyword. It is called re-throwing an exception.
?
1
2
3
4
5
6
7
8
9
10
11
try
{
    String s = null;
    System.out.println(s.length());   //This statement throws NullPointerException
}
catch(NullPointerException ex)
{
    System.out.println("NullPointerException is caught here");
  
    throw ex;     //Re

Saturday, June 18, 2016

sqoop import and export lesson 3


Sqoop import into Hive Tables
_________________________________
hive> create table atab(id int , name string,
    >   sal int, sex string, dno int);
OK
Time taken: 2.004 seconds
hive>
default database in hive is "default"
hdfs location of default database
   /user/hive/warehouse
when hive table is created, a directory will be created in  warehouse location.
  ex: table  atab
   hdfs   /user/hive/warehouse/atab

hive> create table atab(id int , name string,
    >   sal int, sex string, dno int);
OK
Time taken: 2.004 seconds
[training@localhost ~]$ sqoop import  --connect jdbc:mysql://localhost/practice  --username root  --table samp -m 1  --hive-import --hive-table atab --append





hive> select * from atab;
OK
101     aaaa    10000   m       12
102     addaaa  20000   f       12
103     ada     20000   f       13
104     ada     50000   f       11
105     addda   70000   m       12
106     adddda  80000   m       11
107     xadddda 70000   f       12
Time taken: 0.217 seconds
hive>
____
hive> create database testx;
OK
Time taken: 0.029 seconds
hive> use testx
    > ;
OK
Time taken: 0.02 seconds
hive> create table dummy(a int);
OK
Time taken: 0.045 seconds
hive>

_____________
[training@localhost ~]$ hadoop fs -ls /user/hive/warehouse/atab
Found 1 items
-rw-r--r--   1 training supergroup        146 2016-06-05 19:22 /user/hive/warehouse/atab/part-m-00000
[training@localhost ~]$ hadoop fs -cat /user/hive/warehouse/atab/part-m-00000
101 aaaa 10000 m 12
102 addaaa 20000 f 12
103 ada 20000 f 13
104 ada 50000 f 11
105 addda 70000 m 12
106 adddda 80000 m 11
107 xadddda 70000 f 12
[training@localhost ~]$
sqoop write two types of delimiters,
 i) while writing into hdfs,
   sq writes "," delimiter.
 ii) while writing into hive tables,
  sq writes '\001'  delimiter
    ctrl+A
 ____________________________
default delimiter for hive table \001
if hive table  has different delimiter...
 how to import?

___________
hive> create table ctab(id int, name string,
    >   sal int , sex string, dno int)    
    > row format delimited
    >    fields terminated by ',';
OK
Time taken: 0.044 seconds
[training@localhost ~]$ sqoop import  --connect jdbc:mysql://localhost/practice  --username root  --table samp -m 1  --hive-import --hive-table ctab --append



hive> select * from ctab;
OK
NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL
Time taken: 0.074 seconds
hive>
sqoop has written \001 as delimiter, but table has ',' delimiter.
   in the backend file (part-m-00000) there are 0 commas. so entire line is treated as single field. but this is string, in table first col is id which int, thats why id became null.
 there is only one field in files, thats why remaing columns name,sal,sex,dno became null.
solution.  Delimiter should be changed at the time of importing.
_______________
hive> create table dtab like ctab;
[training@localhost ~]$ sqoop import  --connect jdbc:mysql://localhost/practice  --username root  --table samp -m 1  --hive-import --hive-table dtab --append --fields-terminated-by ','
hive> select * from dtab;
OK
101     aaaa    10000   m       12
102     addaaa  20000   f       12
103     ada     20000   f       13
104     ada     50000   f       11
105     addda   70000   m       12
106     adddda  80000   m       11
107     xadddda 70000   f       12
Time taken: 0.073 seconds
hive>
_______________________________
[training@localhost ~]$ cat  >   samp1
100,200,300
300,600,800
100,3000,456
100,456,222
[training@localhost ~]$ hadoop fs -mkdir  exp
[training@localhost ~]$ hadoop fs -copyFromLocal samp1 exp

mysql> create table mysamp(a int, b int, c int);
Query OK, 0 rows affected (0.00 sec)
mysql> select * from mysamp;
Empty set (0.00 sec)

[training@localhost ~]$ sqoop export   --connect  jdbc:mysql://localhost/practice  --username root  --table mysamp  --export-dir  exp/samp1

mysql> select * from mysamp;
+------+------+------+
| a    | b    | c    |
+------+------+------+
|  100 |  200 |  300 |
|  300 |  600 |  800 |
|  100 | 3000 |  456 |
|  100 |  456 |  222 |
+------+------+------+
4 rows in set (0.00 sec)
mysql>
[training@localhost ~]$ cat > samp2
10      20      30
12      34      90
12      45      67
[training@localhost ~]$ hadoop fs -copyFromLocal samp2 exp
[training@localhost ~]$
[training@localhost ~]$ sqoop export   --connect  jdbc:mysql://localhost/practice  --username root  --table mysamp  --export-dir  exp/samp2 --input-fields-terminated-by '\t'
exporting hive results into rdbms.
hive> create table results1(dno int,
   sex  string, tot int);

hive> insert overwrite table results1                  
    >   select dno, sex, sum(sal) from dtab
    >     group by dno, sex;
hive> select * from results1;
OK
11      f       50000
11      m       80000
12      f       90000
12      m       80000
13      f       20000
Time taken: 0.088 seconds
hive>
mysql> create table Aggr(dno int , sex char(1),
    ->    tot int);
Query OK, 0 rows affected (0.00 sec)
mysql>
[training@localhost ~]$ sqoop export   --connect  jdbc:mysql://localhost/practice  --username root  --table Aggr     --export-dir  /user/hive/warehouse/results1/000000_0        --input-fields-terminated-by '\001'
mysql> select * from Aggr;
+------+------+-------+
| dno  | sex  | tot   |
+------+------+-------+
|   11 | f    | 50000 |
|   11 | m    | 80000 |
|   12 | f    | 90000 |
|   12 | m    | 80000 |
|   13 | f    | 20000 |
+------+------+-------+
5 rows in set (0.00 sec)
mysql>
______________________


Sqoop Import Examples Lesson 2

__________________________________________


[training@localhost ~]$ sqoop import \
>  --connect jdbc:mysql://localhost/practice  \
>  --username root  \
>  --table samp -m 1 --columns name,sal,sal*0.1 \
>  --target-dir  sqimp7

above import will failed.

 bcoz, sqoop treats  sal*0.1  as column name int the table, but this column is not available, so import is failed.

 in --columns option, we can not give  any arithmetic expressions over the column. can not generate new fields at the time of importing.

 --columns  name,sal,sal*0.1

________________________
--query option:
   to retrive executed select statement by rdbms into hadoop.

 any valid select statement is allowed.

for select statement , where clause is mandatory.

for where clause any number of conditions can be given, but $CONDITIONS is mandatory.

$CONDITIONS is boolean variable.
 with dafault value FALSE.

when all given expressions are valid,
  then $CONDITIONS will turn to TRUE.

if it is true, then sqoop submits statement to rdbms. then importing will be started.






[training@localhost ~]$ sqoop import  --connect jdbc:mysql://localhost/practice   --username root    --query 'select * from samp where $CONDITIONS' -m 1  --target-dir sqimp8

[training@localhost ~]$ hadoop fs -cat sqimp8/part-m-00000
101,aaaa,10000,m,12
102,addaaa,20000,f,12
103,ada,20000,f,13
104,ada,50000,f,11
105,addda,70000,m,12
106,adddda,80000,m,11
107,xadddda,70000,f,12
[training@localhost ~]$

filtering rows using query:

[training@localhost ~]$ sqoop import  --connect jdbc:mysql://localhost/practice   --username root    --query 'select * from samp where $CONDITIONS and sal>=50000 ' -m 1  --target-dir sqimp9





[training@localhost ~]$ hadoop fs -cat sqimp9/part-m-00000
104,ada,50000,f,11
105,addda,70000,m,12
106,adddda,80000,m,11
107,xadddda,70000,f,12
[training@localhost ~]$

___________

generating new fields at the time of importing.

[training@localhost ~]$ sqoop import  --connect jdbc:mysql://localhost/practice  --username root  --query 'select  name,sal,sal*0.1,sal*0.2,sal+(sal*0.2)-(sal*0.1) from samp where $CONDITIONS'    -m 1  --target-dir  sqimp10



[training@localhost ~]$ hadoop fs -cat sqimp10/part-m-00000
aaaa,10000,1000.0,2000.0,11000.0
addaaa,20000,2000.0,4000.0,22000.0
ada,20000,2000.0,4000.0,22000.0
ada,50000,5000.0,10000.0,55000.0
addda,70000,7000.0,14000.0,77000.0
adddda,80000,8000.0,16000.0,88000.0
xadddda,70000,7000.0,14000.0,77000.0
[training@localhost ~]$



Note: 
  --table  and --query are mutually exclusive.
___________________________________

Merging tables at the time of importing.

[training@localhost ~]$ sqoop import  --connect jdbc:mysql://localhost/practice  --username root  --query 'select * from samp where $CONDITIONS  union all select * from damp where $CONDITIONS'    -m 1 --target-dir sqimp11


[training@localhost ~]$ hadoop fs -cat sqimp11/part-m-00000
101,aaaa,10000,m,12
102,addaaa,20000,f,12
103,ada,20000,f,13
104,ada,50000,f,11
105,addda,70000,m,12
106,adddda,80000,m,11
107,xadddda,70000,f,12
101,aaaa,10000,m,12
102,addaaa,20000,f,12
103,ada,20000,f,13
104,ada,50000,f,11
105,addda,70000,m,12
106,adddda,80000,m,11
107,xadddda,70000,f,12
[training@localhost ~]$

to eliminate duplicate records use "union" between select statements.

 for each select statement $CONDITIONS is mandatory.
__________________________________


[training@localhost ~]$ sqoop import  --connect jdbc:mysql://localhost/practice  --username root  --query "select * from samp where sex='m' and \$CONDITIONS" -m 1  --target-dir sqimp14

[training@localhost ~]$ hadoop fs -cat sqimp14/part-m-00000
101,aaaa,10000,m,12
105,addda,70000,m,12
106,adddda,80000,m,11
[training@localhost ~]$

most of databases, will allow charecter values in double quotes or single quotes.

 but some databases will restrict you , character values should be  in single quotes,
in such case, select statement of the query option should be started in double quotes.
 as per java string .,$ .etc have special meanings. to mask them use
   \$CONDITIONS
______________________________

DURING MERGING, if column order is different,

 --query 'select name, sal from tab1
   where $CONDITIONS union all
     select name, sal from tab2
  where $CONDITIONS'


 COLUMN ORDER OF both select statements should be same.

_______________________________________
IF TABLES HAVE different columns...


 tab1 ---> id, name, sal, sex, dno, city

 tab2 --> id , name, sal, gender, age, dno

--query  ' select id, name,sal, sex, 0 as age,
       dno, city from tab1 where $CONDITIONS
            union all
      select id , name, sal, gender as sex,age,
   dno, "***" as city from tab2
    where $CONDITIONS'

_____________________________________
[training@localhost ~]$ sqoop import  --connect jdbc:mysql://localhost/practice  --username root  --query 'select sex, sum(sal)  from samp where $CONDITIONS group by sex' -m 1  --target-dir sqimp15

-- in above case also, there is no involment of reducer, bcoz the query is executed by rdbms,
 sqoop fetches results of rdbms.. so still mapper only functionality.

In entire sqoop process, "No Reducer Involvement".

___________________________________

 sqoop supports , both etl and elt models.

 for big data(volumes) etl is not recommended.

 first dump all data into hadoop(hdfs),
 later by using pig/hive/spark/mr perfom transformations, so that processes will happen parallely , transformations will get done fastly.

if table volume is less, for simple transformations,  you can proceed with etl model.

_______________________________________


sqoop import can import data from
  
   i) Rdbms to Hdfs
   ii) rdbms to hive
   iii) rdbms to hbase

__________________

by default sqoop writes comma (,) delimiter between fields.

to change the delimiter use following option.

  --fields-terminated-by  '\t'

______________________________________

to import data into existed directory .

 use..

    --append

every time, a new file will be generated in the same directory.
_________________________________________

sankara.deva2016@gmail.com


Sqoop import examples Lesson1

________________________________________ 


$ mysql -u root

mysql>

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| datahub            |
| datamart           |
| halitics           |
| hivemetastore      |
| movielens          |
| mysql              |
| product            |
| siri               |
| training           |
+--------------------+
10 rows in set (0.11 sec)

mysql>

mysql> create database practice;
Query OK, 1 row affected (0.02 sec)

mysql> use practice;
Database changed


mysql> show tables;
Empty set (0.00 sec)

mysql> create table samp(id int primary key,
    ->     name char(10), sal int,
    ->   sex char(1), dno int);
Query OK, 0 rows affected (0.00 sec)

mysql> describe samp;
+-------+----------+------+-----+---------+-------+
| Field | Type     | Null | Key | Default | Extra |
+-------+----------+------+-----+---------+-------+
| id    | int(11)  | NO   | PRI | NULL    |       |
| name  | char(10) | YES  |     | NULL    |       |
| sal   | int(11)  | YES  |     | NULL    |       |
| sex   | char(1)  | YES  |     | NULL    |       |
| dno   | int(11)  | YES  |     | NULL    |       |
+-------+----------+------+-----+---------+-------+
5 rows in set (0.00 sec)

mysql>

mysql>
mysql> insert into samp values(101,'aaaa',10000,'m',12)
    -> ;
Query OK, 1 row affected (0.01 sec)

mysql> insert into samp values(102,'addaaa',20000,'f',12);
Query OK, 1 row affected (0.00 sec)

mysql> insert into samp values(103,'ada',20000,'f',13);
Query OK, 1 row affected (0.00 sec)

mysql> insert into samp values(104,'ada',50000,'f',11);
Query OK, 1 row affected (0.00 sec)

mysql> insert into samp values(105,'addda',70000,'m',12);
Query OK, 1 row affected (0.00 sec)

mysql> insert into samp values(106,'adddda',80000,'m',11);
Query OK, 1 row affected (0.00 sec)

mysql> insert into samp values(107,'xadddda',70000,'f',12);
Query OK, 1 row affected (0.00 sec)

mysql>

mysql> select * from samp;
+-----+---------+-------+------+------+
| id  | name    | sal   | sex  | dno  |
+-----+---------+-------+------+------+
| 101 | aaaa    | 10000 | m    |   12 |
| 102 | addaaa  | 20000 | f    |   12 |
| 103 | ada     | 20000 | f    |   13 |
| 104 | ada     | 50000 | f    |   11 |
| 105 | addda   | 70000 | m    |   12 |
| 106 | adddda  | 80000 | m    |   11 |
| 107 | xadddda | 70000 | f    |   12 |
+-----+---------+-------+------+------+
7 rows in set (0.00 sec)

______________________

[training@localhost ~]$ sqoop import    --connect  jdbc:mysql://localhost/practice   --username root   --table  samp   --target-dir   sqimp1

[training@localhost ~]$ hadoop fs -ls sqimp1
Found 6 items
-rw-r--r--   1 training supergroup          0 2016-06-01 19:36 /user/training/sqimp1/_SUCCESS
drwxr-xr-x   - training supergroup          0 2016-06-01 19:36 /user/training/sqimp1/_logs
-rw-r--r--   1 training supergroup         42 2016-06-01 19:36 /user/training/sqimp1/part-m-00000
-rw-r--r--   1 training supergroup         38 2016-06-01 19:36 /user/training/sqimp1/part-m-00001
-rw-r--r--   1 training supergroup         21 2016-06-01 19:36 /user/training/sqimp1/part-m-00002
-rw-r--r--   1 training supergroup         45 2016-06-01 19:36 /user/training/sqimp1/part-m-00003
[training@localhost ~]$

[training@localhost ~]$ hadoop fs -cat sqimp1/part-m-00000
101,aaaa,10000,m,12
102,addaaa,20000,f,12
[training@localhost ~]$ hadoop fs -cat sqimp1/part-m-00001
103,ada,20000,f,13
104,ada,50000,f,11
[training@localhost ~]$ hadoop fs -cat sqimp1/part-m-00002
105,addda,70000,m,12
[training@localhost ~]$ hadoop fs -cat sqimp1/part-m-00003
106,adddda,80000,m,11
107,xadddda,70000,f,12
[training@localhost ~]$

by default sqoop initiates 4 mappers.

 these mappers will parallely importing data .

that means, table is splitted into 4 parts,
 each part is taken by seperate mapper.

how to controll number of mappers.

 use the option -m <number>

  ex:   -m 2

[training@localhost ~]$ sqoop import \
>  --connect  jdbc:mysql://localhost/practice \
>  --username root  \
>  --table samp -m 2 --target-dir sqimp2

[training@localhost ~]$ hadoop fs -ls sqimp2
Found 4 items
-rw-r--r--   1 training supergroup          0 2016-06-01 19:48 /user/training/sqimp2/_SUCCESS
drwxr-xr-x   - training supergroup          0 2016-06-01 19:48 /user/training/sqimp2/_logs
-rw-r--r--   1 training supergroup         61 2016-06-01 19:48 /user/training/sqimp2/part-m-00000
-rw-r--r--   1 training supergroup         85 2016-06-01 19:48 /user/training/sqimp2/part-m-00001
[training@localhost ~]$ hadoop fs -cat sqimp2/part-m-00000
101,aaaa,10000,m,12
102,addaaa,20000,f,12
103,ada,20000,f,13
[training@localhost ~]$

____________________

mysql> create table damp(id int,
    ->   name char(10), sal int, sex char(1),
    ->   dno int);
Query OK, 0 rows affected (0.00 sec)

mysql> insert into damp select * from samp;
Query OK, 7 rows affected (0.01 sec)
Records: 7  Duplicates: 0  Warnings: 0

mysql> select * from damp;
+------+---------+-------+------+------+
| id   | name    | sal   | sex  | dno  |
+------+---------+-------+------+------+
|  101 | aaaa    | 10000 | m    |   12 |
|  102 | addaaa  | 20000 | f    |   12 |
|  103 | ada     | 20000 | f    |   13 |
|  104 | ada     | 50000 | f    |   11 |
|  105 | addda   | 70000 | m    |   12 |
|  106 | adddda  | 80000 | m    |   11 |
|  107 | xadddda | 70000 | f    |   12 |
+------+---------+-------+------+------+
7 rows in set (0.00 sec)

mysql>


if table is not having primary key,

 number of mappers should be 1.


why?

 when  multiple mappers mappers initiated,
  first mapper automatically points to begining  record of the first split. remaining mappers can not point to begining of their splits randomly. bcoz there is no primary key.

  so only sequential reading is allowed from beginning of table to end of the table.

   to make begining to ending as one split,
 only one mapper should be intiated.

    -m 1.

______________________

[training@localhost ~]$ sqoop import  --connect  jdbc:mysql://localhost/practice  --username root   --table damp -m 1  --target-dir sqimp4


[training@localhost ~]$ hadoop fs -ls sqimp4
Found 3 items
-rw-r--r--   1 training supergroup          0 2016-06-01 19:58 /user/training/sqimp4/_SUCCESS
drwxr-xr-x   - training supergroup          0 2016-06-01 19:58 /user/training/sqimp4/_logs
-rw-r--r--   1 training supergroup        146 2016-06-01 19:58 /user/training/sqimp4/part-m-00000
[training@localhost ~]$ hadoop fs -cat sqimp4/part-m-00000
101,aaaa,10000,m,12
102,addaaa,20000,f,12
103,ada,20000,f,13
104,ada,50000,f,11
105,addda,70000,m,12
106,adddda,80000,m,11
107,xadddda,70000,f,12
[training@localhost ~]$

_______________________________

to filter rows at the time importing..

[training@localhost ~]$ sqoop import  --connect  jdbc:mysql://localhost/practice  --username root   --table damp -m 1 --where 'sal<50000' --target-dir sqimp5



[training@localhost ~]$ hadoop fs -cat sqimp5/part-m-00000
101,aaaa,10000,m,12
102,addaaa,20000,f,12
103,ada,20000,f,13
[training@localhost ~]$

______________________

[training@localhost ~]$ sqoop import  --connect  jdbc:mysql://localhost/practice  --username root   --table damp -m 1 --where 'sex="m"' --target-dir sqimp6


[training@localhost ~]$ hadoop fs -cat sqimp6/part-m-00000
101,aaaa,10000,m,12
105,addda,70000,m,12
106,adddda,80000,m,11
[training@localhost ~]$

_______________________

importing selective columns....

[training@localhost ~]$ sqoop import  --connect  jdbc:mysql://localhost/practice  --username root   --table damp -m 1 --columns name,sal,dno --target-dir sqimp7

[training@localhost ~]$ hadoop fs -cat sqimp7/part-m-00000
aaaa,10000,12
addaaa,20000,12
ada,20000,13
ada,50000,11
addda,70000,12
adddda,80000,11
xadddda,70000,12
[training@localhost ~]$

____________________________________

      sankara.deva2016@gmail.com

Hive Lab1

hive> show databases;


-- lists existed databases;

hive> create table mysamp(line string);
OK
Time taken: 0.543 seconds

hive> load data local inpath 'file1'             
    > into table mysamp;

[training@localhost ~]$ hadoop fs -ls /user/hive/warehouse/mysamp
Found 1 items
-rw-r--r--   1 training supergroup         50 2016-06-15 07:21 /user/hive/warehouse/mysamp/file1

when ever a table is created under default database, in hdfs one directory will created with table name.

 when you load a file into table,
 the file will be copied into backend table's directory.

 location of default database:
  /user/hive/warehouse 

hive> select * from mysamp;
OK
aaaaaaaaaaa
aaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaa
Time taken: 0.166 seconds
hive> 

when select statement is submitted hive starts reading from table's backend directory(all files of directory).

[training@localhost ~]$ cat > file2
bbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbb

hive> load data local inpath 'file2'
    >  into table mysamp;


[training@localhost ~]$ hadoop fs -ls /user/hive/warehouse/mysamp
Found 2 items
-rw-r--r--   1 training supergroup         50 2016-06-15 07:21 /user/hive/warehouse/mysamp/file1
-rw-r--r--   1 training supergroup         55 2016-06-15 07:28 /user/hive/warehouse/mysamp/file2
[training@localhost ~]$ 

hive> select * from mysamp;
OK
aaaaaaaaaaa
aaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbb
Time taken: 0.065 seconds
hive> 

-- here, hive reading from all files of backend directory.

[training@localhost ~]$ hadoop fs -ls /user/hive/warehouse/mysamp
ls: Cannot access /user/hive/warehouse/mysamp: No such file or directory.

-- as above operation, 
  when table is dropped, 
    in hdfs, the backend directory will also be deleted.

--- means, metadata and data both are dropped.
  this is behavior of "inner table" 

two types of tables.
____________________
 i) inner table. (default)
  --if table is dropped, both metadata and data will be deleted 
  from hive table will be deleted.
  from hdfs, backend directory will be deleted.

 ii) external table.
  -- if table is dropped, only metadata will be deleted.

   from hive table only will be deleted. but still backend directory is available with data files.

  so that we can re use.
creating external table:


hive> create external table ursamp(line  string);

hive> load data local inpath 'file1' 
    >  into table ursamp;

hive> load data local inpath 'file2'
    >  into table ursamp;

hive> 

[training@localhost ~]$ hadoop  fs -ls /user/hive/warehouse/ursamp
Found 2 items
-rw-r--r--   1 training supergroup         50 2016-06-15 07:40 /user/hive/warehouse/ursamp/file1
-rw-r--r--   1 training supergroup         55 2016-06-15 07:40 /user/hive/warehouse/ursamp/file2
[training@localhost ~]$ 

hive> select * from ursamp;
OK
aaaaaaaaaaa
aaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbb
Time taken: 0.081 seconds
hive> 

hive> drop table ursamp;


[training@localhost ~]$ hadoop  fs -ls /user/hive/warehouse/ursamp
Found 2 items
-rw-r--r--   1 training supergroup         50 2016-06-15 07:40 /user/hive/warehouse/ursamp/file1
-rw-r--r--   1 training supergroup         55 2016-06-15 07:40 /user/hive/warehouse/ursamp/file2
[training@localhost ~]$ 


from hive ursamp is deleted,
  but still ursamp directory with data files safely available in warehouse location.

adv of exteral table reuse of the data.

___________________________________

how to reuse.?

hive> create table ursamp(line string);

-- create behavior,

   if table's directory (ursamp) is already available , it will use it.
 if not available, it will create directory.

_____________________________

creating databases.

hive> create database mydb;
OK
Time taken: 0.029 seconds
hive> use mydb;                       
OK
Time taken: 0.013 seconds
hive> create table urtab(line string);
OK
Time taken: 0.04 seconds
hive> 

[training@localhost ~]$ hadoop fs -ls /user/hive/warehouse/mydb.db
Found 1 items
drwxr-xr-x   - training supergroup          0 2016-06-15 07:56 /user/hive/warehouse/mydb.db/urtab
[training@localhost ~]$ 

in hdfs, under warehouse location with name database and with .db extension one directory will be created.

    mydb.db

 hive> load data local inpath 'file1' 
    >   into table urtab;

[training@localhost ~]$ hadoop fs -ls /user/hive/warehouse/mydb.db/urtab
Found 1 items
-rw-r--r--   1 training supergroup         50 2016-06-15 07:59 /user/hive/warehouse/mydb.db/urtab/file1

________________________________

create table's directory under custom location.

hive> create table mytab(line string) 
    >  location '/user/myloc';
OK
Time taken: 0.034 seconds
hive> load data local inpath 'file1' into table mytab;
Copying data from file:/home/training/file1
Copying file: file:/home/training/file1
Loading data to table default.mytab
OK
Time taken: 0.131 seconds
hive> 

[training@localhost ~]$ hadoop fs -ls /user/myloc
Found 1 items
-rw-r--r--   1 training supergroup         50 2016-06-15 08:02 /user/myloc/file1
[training@localhost ~]$ 


--- with name /user/myloc one directory is created for mytab table.