간단한 자바 프로그램에서 mapreduce 작업 불러 오기

나는 같은 패키지의 간단한 자바 프로그램에서 mapreduce 작업을 호출하려고 노력 중이다. java 프로그램에서 mapreduce jar 파일을 참조하고 runJar (String args []) 메소드를 사용하여 호출한다. mapreduce 작업을위한 출력 경로. 그러나 프로그램 dint는 작동합니다.

어떻게 그냥 입력, 출력 및 항아리 경로를 주 방법으로 사용하는 프로그램을 실행합니까 ?? 그것을 통해 mapreduce 작업 (jar)을 실행할 수 있습니까 ?? 내가 몇 가지 mapreduce 작업을 하나씩 실행하기를 원하기 때문에이 작업을하고 싶다. 자바 프로그램 vl이 jar 파일을 참조하여 각 작업을 호출하는 것이다. 가능하다면 간단한 서블릿을 사용하여 호출 할 수도있다. 그래프 목적을 위해 출력 파일을 참조하십시오.

/*
 * To change this template, choose Tools | Templates
 * and open the template in the editor.
 */

/**
 *
 * @author root
 */
import org.apache.hadoop.util.RunJar;
import java.util.*;

public class callOther {

    public static void main(String args[])throws Throwable
    {

        ArrayList arg=new ArrayList();

        String output="/root/Desktp/output";

        arg.add("/root/NetBeansProjects/wordTool/dist/wordTool.jar");

        arg.add("/root/Desktop/input");
        arg.add(output);

        RunJar.main((String[])arg.toArray(new String[0]));

    }
}

해결법

==============================

1.오, 제발 runJar로하지 마세요, 자바 API는 아주 좋습니다.

오, 제발 runJar로하지 마세요, 자바 API는 아주 좋습니다.

일반 코드에서 작업을 시작하는 방법을 확인하십시오.

// create a configuration
Configuration conf = new Configuration();
// create a new job based on the configuration
Job job = new Job(conf);
// here you have to put your mapper class
job.setMapperClass(Mapper.class);
// here you have to put your reducer class
job.setReducerClass(Reducer.class);
// here you have to set the jar which is containing your 
// map/reduce class, so you can use the mapper class
job.setJarByClass(Mapper.class);
// key/value of your reducer output
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
// this is setting the format of your input, can be TextInputFormat
job.setInputFormatClass(SequenceFileInputFormat.class);
// same with output
job.setOutputFormatClass(TextOutputFormat.class);
// here you can set the path of your input
SequenceFileInputFormat.addInputPath(job, new Path("files/toMap/"));
// this deletes possible output paths to prevent job failures
FileSystem fs = FileSystem.get(conf);
Path out = new Path("files/out/processed/");
fs.delete(out, true);
// finally set the empty out path
TextOutputFormat.setOutputPath(job, out);

// this waits until the job completes and prints debug out to STDOUT or whatever
// has been configured in your log4j properties.
job.waitForCompletion(true);

외부 클러스터를 사용하는 경우 다음 정보를 통해 구성에 다음 정보를 입력해야합니다.

// this should be like defined in your mapred-site.xml
conf.set("mapred.job.tracker", "jobtracker.com:50001"); 
// like defined in hdfs-site.xml
conf.set("fs.default.name", "hdfs://namenode.com:9000");

hadoop-core.jar이 애플리케이션 컨테이너 클래스 경로에있을 때 문제가되지 않아야한다. 그러나 나는 당신이 웹 페이지에 어떤 종류의 진도 표시기를 넣어야한다고 생각한다. 왜냐하면 그 일을 완료하는데 몇 분에서 몇 시간이 걸리기 때문이다;)

YARN (> 하둡 2)

YARN의 경우 다음 구성을 설정해야합니다.

// this should be like defined in your yarn-site.xml
conf.set("yarn.resourcemanager.address", "yarn-manager.com:50001"); 

// framework is now "yarn", should be defined like this in mapred-site.xm
conf.set("mapreduce.framework.name", "yarn");

// like defined in hdfs-site.xml
conf.set("fs.default.name", "hdfs://namenode.com:9000");

==============================

2.자바 웹 애플리케이션 (Servlet)에서 MapReduce 작업 호출

자바 웹 애플리케이션 (Servlet)에서 MapReduce 작업 호출

Java API를 사용하여 웹 응용 프로그램에서 MapReduce 작업을 호출 할 수 있습니다. 다음은 서블릿에서 MapReduce 작업을 호출하는 작은 예입니다. 단계는 다음과 같습니다.

1 단계 : 먼저 MapReduce 드라이버 서블릿 클래스를 만듭니다. 또한지도 및 감면 서비스를 개발하십시오. 다음은 샘플 코드 스 니펫입니다.

CallJobFromServlet.java

    public class CallJobFromServlet extends HttpServlet {

    protected void doPost(HttpServletRequest request,HttpServletResponse response) throws ServletException, IOException {

    Configuration conf = new Configuration();
    // Replace CallJobFromServlet.class name with your servlet class
        Job job = new Job(conf, " CallJobFromServlet.class"); 
        job.setJarByClass(CallJobFromServlet.class);
        job.setJobName("Job Name");
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        job.setMapperClass(Map.class); // Replace Map.class name with your Mapper class
        job.setNumReduceTasks(30);
        job.setReducerClass(Reducer.class); //Replace Reduce.class name with your Reducer class
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        // Job Input path
        FileInputFormat.addInputPath(job, new  
        Path("hdfs://localhost:54310/user/hduser/input/")); 
        // Job Output path
        FileOutputFormat.setOutputPath(job, new 
        Path("hdfs://localhost:54310/user/hduser/output")); 

        job.waitForCompletion(true);
   }
}

2 단계 : 웹 서버 (예 : Tomcat)의 lib 폴더 안에 모든 관련 jar (hadoop, 애플리케이션 별 jar) 파일을 저장합니다. 이는 Hadoop 구성 (hadoop 'conf'폴더에는 configuration xml 파일, 즉 core-site.xml, hdfs-site.xml 등)에 액세스 할 때 필수입니다. 단지 hadoop lib 디렉토리에서 웹 서버 (tomcat) lib 디렉토리로 단지를 복사하십시오. jar 이름 목록은 다음과 같습니다.

1.  commons-beanutils-1.7.0.jar
2.  commons-beanutils-core-1.8.0.jar
3.  commons-cli-1.2.jar
4.  commons-collections-3.2.1.jar
5.  commons-configuration-1.6.jar
6.  commons-httpclient-3.0.1.jar
7.  commons-io-2.1.jar
8.  commons-lang-2.4.jar
9.  commons-logging-1.1.1.jar
10. hadoop-client-1.0.4.jar
11. hadoop-core-1.0.4.jar
12. jackson-core-asl-1.8.8.jar
13. jackson-mapper-asl-1.8.8.jar
14. jersey-core-1.8.jar

3 단계 : 웹 응용 프로그램을 웹 서버에 배포합니다 (Tomcat의 경우 'webapps'폴더에 있음).

4 단계 : jsp 파일을 작성하고 서블릿 클래스 (CallJobFromServlet.java)를 양식 조치 속성에 링크하십시오. 다음은 샘플 코드 스 니펫입니다.

Index.jsp

<form id="trigger_hadoop" name="trigger_hadoop" action="./CallJobFromServlet ">
      <span class="back">Trigger Hadoop Job from Web Page </span> 
      <input type="submit" name="submit" value="Trigger Job" />      
</form>

==============================
3.이미 hadoop 예제에서 구현 된 작업을위한 또 다른 방법이며, 또한 hadoop jar를 가져와야합니다. 그런 다음 인자로 적절한 String []을 사용하여 원하는 작업 클래스의 정적 main 함수를 호출하십시오

이미 hadoop 예제에서 구현 된 작업을위한 또 다른 방법이며, 또한 hadoop jar를 가져와야합니다. 그런 다음 인자로 적절한 String []을 사용하여 원하는 작업 클래스의 정적 main 함수를 호출하십시오
==============================
4.map과 reduce는 다른 머신에서 실행되기 때문에 참조 된 모든 클래스와 jar는 머신간에 이동해야합니다.

map과 reduce는 다른 머신에서 실행되기 때문에 참조 된 모든 클래스와 jar는 머신간에 이동해야합니다.

패키지 항아리가 있고 바탕 화면에서 실행되는 경우 @ ThomasJungblut의 대답은 OK입니다. 하지만 Eclipse에서 실행하면 클래스를 마우스 오른쪽 단추로 클릭하고 실행하면 작동하지 않습니다.

대신에:
```
job.setJarByClass(Mapper.class);
```
용도:
```
job.setJar("build/libs/hdfs-javac-1.0.jar");
```
동시에, 항아리의 명단에는 메인 클래스 인 Main-Class 속성이 포함되어야합니다.

gradle 사용자의 경우 다음 줄을 build.gradle에 넣을 수 있습니다.
```
jar {
manifest {
    attributes("Main-Class": mainClassName)
}}
```
==============================
5.hadoop-core 라이브러리 (또는 실제로 @ThomasJungblut이 말했듯이, 왜 그렇게하고 싶은지)를 사용하지 않고도이 작업을 수행 할 수있는 여러 가지 방법을 생각할 수 없습니다.

hadoop-core 라이브러리 (또는 실제로 @ThomasJungblut이 말했듯이, 왜 그렇게하고 싶은지)를 사용하지 않고도이 작업을 수행 할 수있는 여러 가지 방법을 생각할 수 없습니다.

그러나 반드시해야하는 경우 작업을위한 워크 플로로 Oozie 서버를 설정 한 다음 Oozie 웹 서비스 인터페이스를 사용하여 Hadoop에 워크 플로를 제출할 수 있습니다.

다시 말하지만, 이것은 토마스의 답변 (hadoop-core jar를 포함하고 코드 스 니펫을 사용)을 사용하여 해결할 수있는 많은 작업과 같습니다.

==============================

6.이런 식으로 할 수 있어요.

이런 식으로 할 수 있어요.

public class Test {

    public static void main(String[] args) throws Exception {
        int res = ToolRunner.run(new Configuration(), new YourJob(), args);
        System.exit(res);

    }

from https://stackoverflow.com/questions/9849776/calling-a-mapreduce-job-from-a-simple-java-program by cc-by-sa and MIT license

'HADOOP' 카테고리의 다른 글

[HADOOP] Spark 데이터 프레임을 Pandas / R 데이터 프레임으로 변환하기위한 요구 사항 (0)	2019.05.28
[HADOOP] Spark에서 압축 된 전체 텍스트 파일 읽기 (0)	2019.05.27
[HADOOP] 하이브 내부 테이블과 외부 테이블의 차이점은 무엇입니까? (0)	2019.05.27
[HADOOP] Hadoop Namenode 장애 조치 프로세스는 어떻게 작동합니까? (0)	2019.05.27
[HADOOP] SparkSQL에서 Hive 메타 스토어에 프로그래밍 방식으로 연결하는 방법 (0)	2019.05.27

복붙노트

[HADOOP] 간단한 자바 프로그램에서 mapreduce 작업 불러 오기

간단한 자바 프로그램에서 mapreduce 작업 불러 오기

해결법

1.오, 제발 runJar로하지 마세요, 자바 API는 아주 좋습니다.

2.자바 웹 애플리케이션 (Servlet)에서 MapReduce 작업 호출

3.이미 hadoop 예제에서 구현 된 작업을위한 또 다른 방법이며, 또한 hadoop jar를 가져와야합니다. 그런 다음 인자로 적절한 String []을 사용하여 원하는 작업 클래스의 정적 main 함수를 호출하십시오

4.map과 reduce는 다른 머신에서 실행되기 때문에 참조 된 모든 클래스와 jar는 머신간에 이동해야합니다.

5.hadoop-core 라이브러리 (또는 실제로 @ThomasJungblut이 말했듯이, 왜 그렇게하고 싶은지)를 사용하지 않고도이 작업을 수행 할 수있는 여러 가지 방법을 생각할 수 없습니다.

6.이런 식으로 할 수 있어요.

'HADOOP' 카테고리의 다른 글

티스토리툴바