Class AbstractBatchedColumnProcessor<T extends Context>
java.lang.Object
com.univocity.parsers.common.processor.core.AbstractBatchedColumnProcessor<T>
- All Implemented Interfaces:
BatchedColumnReader<String>,ColumnReader<String>,Processor<T>
- Direct Known Subclasses:
BatchedColumnProcessor
public abstract class AbstractBatchedColumnProcessor<T extends Context>
extends Object
implements Processor<T>, BatchedColumnReader<String>
A
Processor implementation that stores values of columns in batches. Use this implementation in favor of AbstractColumnProcessor
when processing large inputs to avoid running out of memory.
Values parsed in each row will be split into columns of Strings. Each column has its own list of values.
During the execution of the process, the batchProcessed(int) method will be invoked after a given number of rows has been processed.
The user can access the lists with values parsed for all columns using the methods getColumnValuesAsList(),
getColumnValuesAsMapOfIndexes() and getColumnValuesAsMapOfNames().
After batchProcessed(int) is invoked, all values will be discarded and the next batch of column values will be accumulated.
This process will repeat until there's no more rows in the input.
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate intprivate intprivate final intprivate final ColumnSplitter<String> -
Constructor Summary
ConstructorsConstructorDescriptionAbstractBatchedColumnProcessor(int rowsPerBatch) Constructs a batched column processor configured to invoke thebatchesProcessedmethod after a given number of rows has been processed. -
Method Summary
Modifier and TypeMethodDescriptionabstract voidbatchProcessed(int rowsInThisBatch) Callback to the user, where the lists with values parsed for all columns can be accessed using the methodsColumnReader.getColumnValuesAsList(),ColumnReader.getColumnValuesAsMapOfIndexes()andColumnReader.getColumnValuesAsMapOfNames().intReturns the number of batches already processedgetColumn(int columnIndex) Returns the values of a given column.Returns the values of a given column.Returns the values processed for each columnReturns a map of column indexes and their respective list of values parsed from the input.Returns a map of column names and their respective list of values parsed from the input.final String[]Returns the column headers.intReturns the number of rows processed in each batchvoidprocessEnded(T context) This method will by invoked by the parser once, after the parsing process stopped and all resources were closed.voidprocessStarted(T context) This method will by invoked by the parser once, when it is ready to start processing the input.final voidFills a given map associating each column index to its list of valuesfinal voidFills a given map associating each column name to its list o valuesvoidrowProcessed(String[] row, T context) Invoked by the parser after all values of a valid record have been processed.
-
Field Details
-
splitter
-
rowsPerBatch
private final int rowsPerBatch -
batchCount
private int batchCount -
batchesProcessed
private int batchesProcessed
-
-
Constructor Details
-
AbstractBatchedColumnProcessor
public AbstractBatchedColumnProcessor(int rowsPerBatch) Constructs a batched column processor configured to invoke thebatchesProcessedmethod after a given number of rows has been processed.- Parameters:
rowsPerBatch- the number of rows to process in each batch.
-
-
Method Details
-
processStarted
Description copied from interface:ProcessorThis method will by invoked by the parser once, when it is ready to start processing the input.- Specified by:
processStartedin interfaceProcessor<T extends Context>- Parameters:
context- A contextual object with information and controls over the current state of the parsing process
-
rowProcessed
Description copied from interface:ProcessorInvoked by the parser after all values of a valid record have been processed.- Specified by:
rowProcessedin interfaceProcessor<T extends Context>- Parameters:
row- the data extracted by the parser for an individual record. Note that:- it will never by null.
- it will never be empty unless explicitly configured using
CommonSettings.setSkipEmptyLines(boolean) - it won't contain lines identified by the parser as comments. To disable comment processing set
Format.setComment(char)to '\0'
context- A contextual object with information and controls over the current state of the parsing process
-
processEnded
Description copied from interface:ProcessorThis method will by invoked by the parser once, after the parsing process stopped and all resources were closed.It will always be called by the parser: in case of errors, if the end of the input us reached, or if the user stopped the process manually using
Context.stop().- Specified by:
processEndedin interfaceProcessor<T extends Context>- Parameters:
context- A contextual object with information and controls over the state of the parsing process
-
getHeaders
Description copied from interface:ColumnReaderReturns the column headers. This can be either the headers defined inCommonSettings.getHeaders()or the headers parsed in the input whenCommonSettings.getHeaders()equals totrue- Specified by:
getHeadersin interfaceColumnReader<T extends Context>- Returns:
- the headers of all column parsed.
-
getColumnValuesAsList
Description copied from interface:ColumnReaderReturns the values processed for each column- Specified by:
getColumnValuesAsListin interfaceColumnReader<T extends Context>- Returns:
- a list of lists. The stored lists correspond to the position of the column processed from the input; Each list contains the corresponding values parsed for a column, across multiple rows.
-
putColumnValuesInMapOfNames
Description copied from interface:ColumnReaderFills a given map associating each column name to its list o values- Specified by:
putColumnValuesInMapOfNamesin interfaceColumnReader<T extends Context>- Parameters:
map- the map to hold the values of each column
-
putColumnValuesInMapOfIndexes
Description copied from interface:ColumnReaderFills a given map associating each column index to its list of values- Specified by:
putColumnValuesInMapOfIndexesin interfaceColumnReader<T extends Context>- Parameters:
map- the map to hold the values of each column
-
getColumnValuesAsMapOfNames
Description copied from interface:ColumnReaderReturns a map of column names and their respective list of values parsed from the input.- Specified by:
getColumnValuesAsMapOfNamesin interfaceColumnReader<T extends Context>- Returns:
- a map of column names and their respective list of values.
-
getColumnValuesAsMapOfIndexes
Description copied from interface:ColumnReaderReturns a map of column indexes and their respective list of values parsed from the input.- Specified by:
getColumnValuesAsMapOfIndexesin interfaceColumnReader<T extends Context>- Returns:
- a map of column indexes and their respective list of values.
-
getColumn
Description copied from interface:ColumnReaderReturns the values of a given column.- Specified by:
getColumnin interfaceColumnReader<T extends Context>- Parameters:
columnName- the name of the column in the input.- Returns:
- a list with all data stored in the given column
-
getColumn
Description copied from interface:ColumnReaderReturns the values of a given column.- Specified by:
getColumnin interfaceColumnReader<T extends Context>- Parameters:
columnIndex- the position of the column in the input (0-based).- Returns:
- a list with all data stored in the given column
-
getRowsPerBatch
public int getRowsPerBatch()Description copied from interface:BatchedColumnReaderReturns the number of rows processed in each batch- Specified by:
getRowsPerBatchin interfaceBatchedColumnReader<T extends Context>- Returns:
- the number of rows per batch
-
getBatchesProcessed
public int getBatchesProcessed()Description copied from interface:BatchedColumnReaderReturns the number of batches already processed- Specified by:
getBatchesProcessedin interfaceBatchedColumnReader<T extends Context>- Returns:
- the number of batches already processed
-
batchProcessed
public abstract void batchProcessed(int rowsInThisBatch) Description copied from interface:BatchedColumnReaderCallback to the user, where the lists with values parsed for all columns can be accessed using the methodsColumnReader.getColumnValuesAsList(),ColumnReader.getColumnValuesAsMapOfIndexes()andColumnReader.getColumnValuesAsMapOfNames().- Specified by:
batchProcessedin interfaceBatchedColumnReader<T extends Context>- Parameters:
rowsInThisBatch- the number of rows processed in the current batch. This corresponds to the number of elements of each list of each column.
-