Você está na página 1de 6

Spring Batch - File Parallel Processing

If you are new to Java and related software and you're worried about where to start, you've come to the right place!
Check our existing tutorials and new tutorials will be posted regularly.  All tutorials follow a simple step-by-step plan for
you to read, learn and master Java. 
Welcome to the exciting world of EComputerCoach!

  Home News Training Forum Contact Us  

Home Web Only Local  Search

Main Menu Spring Batch - File Parallel Processing


  Java API Links
Spring Framework - Spring Batch J2SE 1.4
Home
Interview Questions Written by Ramana (SCJP,SCWCD,SCDJWS)   
J2EE 1.4
J2SE 5.0
Tech Links Tuesday, 19 January 2010 23:11
Tech News J2EE 5.0
J2SE 6
In this article i am going to explain how we can develop a Spring Batch
Article Index J2EE 6
Application to processes a huge file by partitioning the file into several
small files and processing all generated via a parallel processing. using Spring Batch - File Parallel Processing
this approach, we can speed up file processing. Page 2
Page 3
  All Pages
Spring batch provides lot of ready made components that we can use
right out-of-box for this purpose.

I am assuming that you have already read my spring batch helloworld application and have set up your project in
eclipse.

For this job purpose, we can use my old file to file spring batch job xml file and convert the same to do the parallel
processing. First step is to partition large input file into multiple small files. There are several utilities to do that. Then
we can feed these small files as input and fire up same number of processes so that each process can take up one
file and produce the corresponding output file. Total processing time would reduce tremendously.
 
We can use the same fileItemReader, fileItemWriter and fileItemProcessor. The only change is how we set the
resource. In our original file to file job, we configured the input file location and output file location in the job definition
itself. Now, we, instead, let it figure out at runtime by reading the data from StepExecutionContext as shown below
 
 Before
<beans:property value="file:c:\data\output\output.txt" />
 
Now
<beans:property value="#{stepExecutionContext[outputFile]}" />
 
outputFile is the key into StepExecutionContext.
 
We will use a spring batch class
called org.springframework.batch.core.partition.support.SimpleStepExecutionSplitter
 for splitting the step execution into multiple threads.
 
simpleStepExecutionSplitter stores the names of the input files for each split process in the stepexecutionContext with
the key "fileName"
 
We will use another spring batch class
called org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler
 for handling the execution of threads created.
  
Using the step listener class, we can listen and do extra activities at various stages during the process of job i.e.
before step, after step .et.c 
In our case, we will use beforeStep method where we will write logic to figure out name of outputfile name and store
information required for each split process at run time in stepexecutionContext . At run time, each split process reads
information from the StepExecutionContext and act accordingly.
 
OutputFileListener.java
1 package com.ecomputercoach.file.partition;
2  
3 import org.apache.commons.io.FilenameUtils;
4 import org.springframework.batch.core.StepExecution;
5 import org.springframework.batch.core.listener.StepListenerSupport;
6 import org.springframework.batch.item.ExecutionContext;
7  
8 @SuppressWarnings("unchecked")
9 public class OutputFileListener extends StepListenerSupport{

http://www.ecomputercoach.com/index.php/component/content/article/53-spring-batch-file-parallel-processing.html?showall=1[09/05/2011 16:34:17]
Spring Batch - File Parallel Processing

10  
11 private String outputKeyName = "outputFile";
12  
13 private String inputKeyName = "fileName";
14  
15 public void setOutputKeyName(String outputKeyName) {
16 this.outputKeyName = outputKeyName;
17 }
18  
19 public void setInputKeyName(String inputKeyName) {
20 this.inputKeyName = inputKeyName;
21 }
22  
23  
24 public void beforeStep(StepExecution stepExecution)
25 {
26 ExecutionContext executionContext =
27 stepExecution.getExecutionContext();
28 if (executionContext.containsKey(inputKeyName) &&
29 !executionContext.containsKey(outputKeyName)) {
30 String inputName = executionContext.getString(inputKeyName);
31 executionContext.putString(outputKeyName, "file:c:/data/output/" +
32 FilenameUtils.getBaseName(inputName) + ".csv");
33 }
34 }
 
}
 

  
FileToFile_PartitioningJob.xml
1 <?xml version="1.0" encoding="UTF-8"?>
2 <beans:beans xmlns="http://www.springframework.org/schema/batch"
3 xmlns:beans="http://www.springframework.org/schema/beans"
4 xmlns:aop="http://www.springframework.org/schema/aop"
5 xmlns:tx="http://www.springframework.org/schema/tx"
6 xmlns:p="http://www.springframework.org/schema/p"
7 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
8 xsi:schemaLocation="
9 http://www.springframework.org/schema/beans
10 http://www.springframework.org/schema/beans/spring-beans-2.0.xsd
11 http://www.springframework.org/schema/batch
12 http://www.springframework.org/schema/batch/spring-batch-2.0.xsd
13 http://www.springframework.org/schema/aop
14 http://www.springframework.org/schema/aop/spring-aop-2.0.xsd
15 http://www.springframework.org/schema/tx
16 http://www.springframework.org/schema/tx/spring-tx-2.0.xsd">
17  
18 <!--<beans:import resource="MEMORY-JOBREPOSITORY.xml"/> -->
19 <beans:import resource="DB-JOBREPOSITORY.xml"/>
20  
21 <beans:bean id="playerFileItemReader"
22 class="org.springframework.batch.item.file.FlatFileItemReader" scope=
23 <beans:property name="resource"
24
value="#{stepExecutionContext[fileName]}" />
25
<beans:property name="strict" value="false" />
26
<!-- <beans:property name="resource"
27
value="file:c:\data\input\player.csv" /> -->
28
<beans:property name="lineMapper">
29
30 <beans:bean
31 class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
32 <beans:property name="lineTokenizer">
33 <beans:bean
34 class="org.springframework.batch.item.file.transform.DelimitedLineToke
35 <beans:property name="d
36 value=","/>
37 <beans:property name="n
38 value="ID,lastName,firstName,position,debutYear,finalYear" />
39 </beans:bean>
40 </beans:property>
41 <beans:property name="fieldSetMapper">
42
<beans:bean
43
class="com.ecomputercoach.file.PlayerFieldSetMapper" />
44
</beans:property>
45
46 </beans:bean>
47 </beans:property>
48 </beans:bean>
49  
50 <beans:bean id="careerProcessor"
51 class="com.ecomputercoach.file.CareerProcessor" scope="step"/>
52  

http://www.ecomputercoach.com/index.php/component/content/article/53-spring-batch-file-parallel-processing.html?showall=1[09/05/2011 16:34:17]
Spring Batch - File Parallel Processing

53 <beans:bean id="playerFileItemWriter"
54 class="org.springframework.batch.item.file.FlatFileItemWriter" scope=
55 <!-- <beans:property name="resource"
56 value="file:c:\data\output\output.txt" /> -->
57 <beans:property name="resource"
58 value="#{stepExecutionContext[outputFile]}" />
59 <beans:property name="shouldDeleteIfExists" value="tru
60 <beans:property name="lineAggregator">
61 <beans:bean
62
class="org.springframework.batch.item.file.transform.DelimitedLineAggr
63
<beans:property name="delimiter" value
64
<beans:property name="fieldExtractor">
65
66 <beans:bean
67 class="org.springframework.batch.item.file.transform.BeanWrapperFieldE
68 <beans:property name="n
69 value="fullName,careerLength"/>
70 </beans:bean>
71 </beans:property>
72 </beans:bean>
73 </beans:property>
74 </beans:bean>
75  
76 <beans:bean name="step1:master"
77 class="org.springframework.batch.core.partition.support.PartitionStep"
78 <beans:property name="jobRepository" ref="jobRepositor
79
<beans:property name="stepExecutionSplitter">
80
<beans:bean
81
class="org.springframework.batch.core.partition.support.SimpleStepExec
82
83 <beans:constructor-arg ref="jobReposit
84 <beans:constructor-arg ref="step1" />
85 <beans:constructor-arg>
86 <beans:bean
87 class="org.springframework.batch.core.partition.support.MultiResourceP
88 <beans:property name="r
89 value="file:c:/data/input/splitfiles/player*.csv" />
90 </beans:bean>
91 </beans:constructor-arg>
92 </beans:bean>
</beans:property>
<beans:property name="partitionHandler">
<beans:bean
class="org.springframework.batch.core.partition.support.TaskExecutorPa
<beans:property name="taskExecutor"
ref="asyncTaskExecutor" />

 
You can download the data file player.csv and place it in your c:\data\input\splitfiles folder. You can create multiple
copies of the same file and name them with pattern like player1.csv,  player2.csv, player3.csv, player4.csv,
player5.csv.

We can see the following saved to the BATCH_STEP_EXECUTION_CONTEXT table in the repository.

 
Here is the screen shot of my folder structure.

http://www.ecomputercoach.com/index.php/component/content/article/53-spring-batch-file-parallel-processing.html?showall=1[09/05/2011 16:34:17]
Spring Batch - File Parallel Processing

 
For other source files, you can refer to my file to file job article.
 
Goodluck!. Thanks for visiting EComputerCoach.
 
Attachments:
File File size

CareerProcessor.java 0 Kb
FileToFile_PartitioningJob.xml 4 Kb
OutputFileListener.java 1 Kb
Player.java 1 Kb
PlayerFieldSetMapper.java 0 Kb

Last Updated on Wednesday, 17 March 2010 18:49


 

Comments    

 
# 2010-03-25 03:15 +1
Thanks for the wonderful article it will be great if you can upload the full code.
Reply | Reply with quote | Quote
 

 
# Administrator 2010-03-25 17:33 +1
sure..
Reply | Reply with quote | Quote
 

 
# 2010-12-29 01:03 0
superb one...i need an idea for below scenario

A job should take 4 input files and genertae one Output file.Please help em on that.
Reply | Reply with quote | Quote
 

 
# Administrator 2010-12-29 22:04 0
Hello Pethaperumal,
This can be solved in multiple ways based on the content in each file and similarities among them.

if the format of the content in each file is same, then append all files together and prepare one big file
which will be fed as input to the batch job.

if the format of the content in each file is different, then create a job with 5 steps where each of first four
steps take one file as input and process the data and load the data into a temp table in the database.

Final step would read the data from temp table as input and create a file as output.

if you want to speed up the process, there are other options.

Hope this helps.

http://www.ecomputercoach.com/index.php/component/content/article/53-spring-batch-file-parallel-processing.html?showall=1[09/05/2011 16:34:17]
Spring Batch - File Parallel Processing

-Ramana
Reply | Reply with quote | Quote
 

 
# 2011-01-11 06:27 0
Thanks for the reply.I wii get back if have any doubts.
Reply | Reply with quote | Quote
 

 
# zulu 2011-04-27 05:32 0
I have a file with more than 1 millons of records with size approx 400 MB. I am getting error while reading the
flat file as heap size out of memory error. Please suggest how to resolve this. Thanks in Advance.
Reply | Reply with quote | Quote
 

 
# Administrator 2011-04-27 10:48 0
try changing your commit interval to a lesser value. please do understand that until you commit to the target,
data is stored in the memory. So, commit data regularly.

Hope this helps.


-Ramana
Reply | Reply with quote | Quote
 

 
# zulu 2011-04-28 03:55 0
I am getting same error after changing the commit-interval to 10.
Please find the complete scenario.
1. I am read the flat file through FlatFileItemReader
2. In Writer and extending the ItemWriter and write the data into the Queue.

My question is the data written into the Queue will be still exist in the heap.
where the flush will be happen List of Objects in the Write Data.
Reply | Reply with quote | Quote
 

 
# Administrator 2011-04-28 15:50 0
Hi Zulu,
Flush should happen after you write to the queue. I would suggest you to look into the flatfileitem writer source
code to understand the flushing part better.

-Ramana
Reply | Reply with quote | Quote
 
Refresh comments list
RSS feed for comments to this post.

Add comment

Name (required)

E-mail (required)

1000 symbols left

Notify me of follow-up comments

Refresh

Send

JComments

Copyright © 2011 EComputerCoach. All Rights Reserved.

All rights reserved.    Privacy Policy    Terms and Conditions    valid XHTML and CSS.

http://www.ecomputercoach.com/index.php/component/content/article/53-spring-batch-file-parallel-processing.html?showall=1[09/05/2011 16:34:17]
Spring Batch - File Parallel Processing

http://www.ecomputercoach.com/index.php/component/content/article/53-spring-batch-file-parallel-processing.html?showall=1[09/05/2011 16:34:17]

Você também pode gostar