本文共 4242 字,大约阅读时间需要 14 分钟。
1.实验机相关信息:
[root@node2 ~]# cat /etc/redhat-release CentOS Linux release 7.2.1511 (Core) [root@node2 ~]# uname -r3.10.0-327.el7.x86_62.配置epel源,以yum方式安装openjdkyum search java | grep -i JDKyum install java-1.8.0-openjdk java-1.8.0-openjdk-devel 3.设置JAVA_HOME 环境变量[root@node2 ~]# cat /etc/profile.d/java_home.sh export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64export PATH=$PATH:$JAVA_HOME/bin使配置生效source /etc/profile.d/java_home.sh 或 . /etc/profile.d/java_home.sh 4.测试java是否安装配置成功[root@node2 ~]# java -versionopenjdk version "1.8.0_161"OpenJDK Runtime Environment (build 1.8.0_161-b14)OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode)5.创建java小程序,编译 打印hello world
[root@node2 ~]# cat helloworld.javapublic class helloworld { public static void main(String[] args){ System.out.println("hello wolrd!"); }}
[root@node2 ~]# javac helloworld.java #编译后会出现helloworld.class 这个类文件
[root@node2 ~]# java helloworld #运行hello wolrd!#############################################################################
什么是Apache Hadoop?
Apache™Hadoop®项目为可靠的,可扩展的分布式计算开发开源软件。Apache Hadoop软件库是一个框架,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集。旨在从单个服务器扩展到数千台机器,每台机器提供本地计算和存储。该库本身不是依靠硬件来提供高可用性,而是设计用于在应用层检测并处理故障,从而在一组计算机之上提供高可用性服务,每个计算机都可能出现故障。官网下载二进制包,解压到/usr/locl目录,创建软连接同目录下hadoop,配置PATH变量,使生效
[jerry@node2 ~]$ cat /etc/profile.d/hadoop.sh export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin
[root@node2 ~]# hadoopUsage: hadoop [OPTIONS] SUBCOMMAND [SUBCOMMAND OPTIONS] or hadoop [OPTIONS] CLASSNAME [CLASSNAME OPTIONS] where CLASSNAME is a user-provided Java class OPTIONS is none or any of:buildpaths attempt to add class files from build tree--config dir Hadoop config directory--debug turn on shell script debug mode--help usage informationhostnames list[,of,host,names] hosts to use in slave modehosts filename list of hosts to use in slave modeloglevel level set the log4j level for this commandworkers turn on worker mode SUBCOMMAND is one of: Admin Commands:daemonlog get/set the log level for each daemon Client Commands:archive create a Hadoop archivechecknative check native Hadoop and compression libraries availabilityclasspath prints the class path needed to get the Hadoop jar and the required librariesconftest validate configuration XML filescredential interact with credential providersdistch distributed metadata changerdistcp copy file or directories recursivelydtutil operations related to delegation tokensenvvars display computed Hadoop environment variablesfs run a generic filesystem user clientgridmix submit a mix of synthetic job, modeling a profiled from production loadjarrun a jar file. NOTE: please use "yarn jar" to launch YARN applications, not this command.jnipath prints the java.library.pathkdiag Diagnose Kerberos Problemskerbname show auth_to_local principal conversionkey manage keys via the KeyProviderrumenfolder scale a rumen input tracerumentrace convert logs into a rumen traces3guard manage metadata on S3trace view and modify Hadoop tracing settingsversion print the version Daemon Commands:kms run KMS, the Key Management ServerSUBCOMMAND may print help when invoked w/o parameters or with -h.
Hadoop 默认配置是以非分布式模式运行,即单 Java 进程,方便进行调试。可以执行附带的例子 WordCount 来感受下 Hadoop 的运行。将 input 文件夹中的文件作为输入,统计当中符合正则表达式 wo[a-z.]+ 的单词出现的次数,并输出结果到 output 文件夹中。
如果需要再次运行,需要删除output文件夹(因为Hadoop 默认不会覆盖结果文件):
# cd /usr/local/hadoop/ # mkdir input # cp etc/hadoop/*.xml input # bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0.jar grep input output 'dfs[a-z.]+' # cat output/* 1 work
[root@node2 /usr/local/hadoop]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0.jar grep /etc/passwd output 'root'
[root@node2 /usr/local/hadoop]# cat output/*转载于:https://blog.51cto.com/mengyao/2106002