public abstract class AbstractBatchedColumnProcessor<T extends Context> extends java.lang.Object implements Processor<T>, BatchedColumnReader<java.lang.String>
Processor implementation that stores values of columns in batches. Use this implementation in favor of AbstractColumnProcessor
when processing large inputs to avoid running out of memory.
Values parsed in each row will be split into columns of Strings. Each column has its own list of values.
During the execution of the process, the batchProcessed(int) method will be invoked after a given number of rows has been processed.
The user can access the lists with values parsed for all columns using the methods getColumnValuesAsList(),
getColumnValuesAsMapOfIndexes() and getColumnValuesAsMapOfNames().
After batchProcessed(int) is invoked, all values will be discarded and the next batch of column values will be accumulated.
This process will repeat until there's no more rows in the input.
AbstractParser,
BatchedColumnReader,
Processor| Modifier and Type | Field and Description |
|---|---|
private int |
batchCount |
private int |
batchesProcessed |
private int |
rowsPerBatch |
private ColumnSplitter<java.lang.String> |
splitter |
| Constructor and Description |
|---|
AbstractBatchedColumnProcessor(int rowsPerBatch)
Constructs a batched column processor configured to invoke the
batchesProcessed method after a given number of rows has been processed. |
| Modifier and Type | Method and Description |
|---|---|
abstract void |
batchProcessed(int rowsInThisBatch)
Callback to the user, where the lists with values parsed for all columns can be accessed using the methods
ColumnReader.getColumnValuesAsList(),
ColumnReader.getColumnValuesAsMapOfIndexes() and ColumnReader.getColumnValuesAsMapOfNames(). |
int |
getBatchesProcessed()
Returns the number of batches already processed
|
java.util.List<java.lang.String> |
getColumn(int columnIndex)
Returns the values of a given column.
|
java.util.List<java.lang.String> |
getColumn(java.lang.String columnName)
Returns the values of a given column.
|
java.util.List<java.util.List<java.lang.String>> |
getColumnValuesAsList()
Returns the values processed for each column
|
java.util.Map<java.lang.Integer,java.util.List<java.lang.String>> |
getColumnValuesAsMapOfIndexes()
Returns a map of column indexes and their respective list of values parsed from the input.
|
java.util.Map<java.lang.String,java.util.List<java.lang.String>> |
getColumnValuesAsMapOfNames()
Returns a map of column names and their respective list of values parsed from the input.
|
java.lang.String[] |
getHeaders()
Returns the column headers.
|
int |
getRowsPerBatch()
Returns the number of rows processed in each batch
|
void |
processEnded(T context)
This method will by invoked by the parser once, after the parsing process stopped and all resources were closed.
|
void |
processStarted(T context)
This method will by invoked by the parser once, when it is ready to start processing the input.
|
void |
putColumnValuesInMapOfIndexes(java.util.Map<java.lang.Integer,java.util.List<java.lang.String>> map)
Fills a given map associating each column index to its list of values
|
void |
putColumnValuesInMapOfNames(java.util.Map<java.lang.String,java.util.List<java.lang.String>> map)
Fills a given map associating each column name to its list o values
|
void |
rowProcessed(java.lang.String[] row,
T context)
Invoked by the parser after all values of a valid record have been processed.
|
private final ColumnSplitter<java.lang.String> splitter
private final int rowsPerBatch
private int batchCount
private int batchesProcessed
public AbstractBatchedColumnProcessor(int rowsPerBatch)
batchesProcessed method after a given number of rows has been processed.rowsPerBatch - the number of rows to process in each batch.public void processStarted(T context)
ProcessorprocessStarted in interface Processor<T extends Context>context - A contextual object with information and controls over the current state of the parsing processpublic void rowProcessed(java.lang.String[] row,
T context)
ProcessorrowProcessed in interface Processor<T extends Context>row - the data extracted by the parser for an individual record. Note that:
CommonSettings.setSkipEmptyLines(boolean)Format.setComment(char) to '\0'context - A contextual object with information and controls over the current state of the parsing processpublic void processEnded(T context)
Processor It will always be called by the parser: in case of errors, if the end of the input us reached, or if the user stopped the process manually using Context.stop().
processEnded in interface Processor<T extends Context>context - A contextual object with information and controls over the state of the parsing processpublic final java.lang.String[] getHeaders()
ColumnReaderCommonSettings.getHeaders() or the headers parsed in
the input when CommonSettings.getHeaders() equals to truegetHeaders in interface ColumnReader<java.lang.String>public final java.util.List<java.util.List<java.lang.String>> getColumnValuesAsList()
ColumnReadergetColumnValuesAsList in interface ColumnReader<java.lang.String>public final void putColumnValuesInMapOfNames(java.util.Map<java.lang.String,java.util.List<java.lang.String>> map)
ColumnReaderputColumnValuesInMapOfNames in interface ColumnReader<java.lang.String>map - the map to hold the values of each columnpublic final void putColumnValuesInMapOfIndexes(java.util.Map<java.lang.Integer,java.util.List<java.lang.String>> map)
ColumnReaderputColumnValuesInMapOfIndexes in interface ColumnReader<java.lang.String>map - the map to hold the values of each columnpublic final java.util.Map<java.lang.String,java.util.List<java.lang.String>> getColumnValuesAsMapOfNames()
ColumnReadergetColumnValuesAsMapOfNames in interface ColumnReader<java.lang.String>public final java.util.Map<java.lang.Integer,java.util.List<java.lang.String>> getColumnValuesAsMapOfIndexes()
ColumnReadergetColumnValuesAsMapOfIndexes in interface ColumnReader<java.lang.String>public java.util.List<java.lang.String> getColumn(java.lang.String columnName)
ColumnReadergetColumn in interface ColumnReader<java.lang.String>columnName - the name of the column in the input.public java.util.List<java.lang.String> getColumn(int columnIndex)
ColumnReadergetColumn in interface ColumnReader<java.lang.String>columnIndex - the position of the column in the input (0-based).public int getRowsPerBatch()
BatchedColumnReadergetRowsPerBatch in interface BatchedColumnReader<java.lang.String>public int getBatchesProcessed()
BatchedColumnReadergetBatchesProcessed in interface BatchedColumnReader<java.lang.String>public abstract void batchProcessed(int rowsInThisBatch)
BatchedColumnReaderColumnReader.getColumnValuesAsList(),
ColumnReader.getColumnValuesAsMapOfIndexes() and ColumnReader.getColumnValuesAsMapOfNames().batchProcessed in interface BatchedColumnReader<java.lang.String>rowsInThisBatch - the number of rows processed in the current batch. This corresponds to the number of elements of each list of each column.