我的背景——在Hadoop世界工作了4周。使用Cloudera的Hadoop VM对Hive, Pig和Hadoop进行了一些尝试。已阅读谷歌关于Map-Reduce和GFS的论文(PDF链接)。
我明白——
猪的语言猪的拉丁语是一种转变
来自(适合程序员的思维方式)
SQL喜欢声明式的
编程和Hive的查询语言密切相关
类似于SQL。
Pig位于Hadoop之上
原则也可以凌驾于之上
德律阿得斯。我可能错了,但蜂巢错了
与Hadoop紧密耦合。
都是Pig Latin和Hive命令
编译映射和减少作业。
我的问题是——当一个(比如猪)可以达到目的时,拥有两者的目标是什么?难道只是因为雅虎宣传了Pig !和Facebook的Hive ?
看看“dezyre”文章中关于猪和蜂巢的坚果壳比较
Hive在分区、服务器、Web接口和JDBC/ODBC支持方面优于PIG。
一些差异:
Hive is best for structured Data & PIG is best for semi structured data
Hive is used for reporting & PIG for programming
Hive is used as a declarative SQL & PIG as a procedural language
Hive supports partitions & PIG does not
Hive can start an optional thrift based server & PIG cannot
Hive defines tables beforehand (schema) + stores schema information in a database & PIG doesn't have a dedicated metadata of database
Hive does not support Avro but PIG does. EDIT: Hive supports Avro, specify the serde as org.apache.hadoop.hive.serde2.avro
Pig also supports additional COGROUP feature for performing outer joins but hive does not. But both Hive & PIG can join, order & sort dynamically.