Streaming Data Processing with DataMapper

Especially useful when working with large datasets, Anypoint DataMapper supports streaming data input and output. For example, when reading information from a very large input file, you can use streaming you avoid having DataMapper load the whole file into memory. Instead, DataMapper works as a pipeline: it sequentially reads the file and store the data in a cache, performs data mapping, sends the output to the next transformer, empties the cache, then begins again. Using this procedure, DataMapper can parse a 500 MB CSV file using only about 75 MB of RAM, resulting in significant improvements in performance and resources utilization.

Anypoint DataMapper streaming supports CSV and fixed width input and output formats.
You can configure the size of the stream cache to optimize for performance.

DataMapper will continue to be fully supported in all current and future versions of Mule ESB 3.x, however it will be removed in Mule 4.0 in favour of the Transform Message component (based on DataWeave code). We recommend that if you wish to take advantage of the new capabilities of DataWeave or if you start new projects, upgrade now.

A migration tool is now included in Studio, which assists in converting a DataMapper map to DataWeave. Right click on a DataMapper, select Migrate to DataWeave, and follow the instructions.

If you don’t see DataMapper on your palette, you can enable it by going to Preferences → Anypoint Studio → Palette Profiles and ticking the checkbox Show deprecated Mule Components and Attributes.

Assumptions

This document assumes the reader is familiar with the Anypoint DataMapper Transformer. If you are not, start from the beginning: DataMapper User Guide and Reference. For a listing of all available tools in DataMapper, consult DataMapper Visual Reference.

Setting Streaming in DataMapper

To set the Streaming parameter in your data mapping flow, follow these steps:

In the DataMapper view, click the Properties icon (highlighted below).
DataMapper displays the general configuration dialog, shown below. Click Streaming.
In the Pipe Size input field, enter the desired size of the cache. The default is 2048. Bear in mind that:
- When working with files, the value of Pipe Size is expressed in bytes
- When working with collections, the value is expressed in number of collection elements

Handling Exceptions

If an exception occurs in the mapping, DataMapper stops the streaming engine as soon as possible. To avoid undesired consequences in case of failure (such as inserting only part of a row into a database) use Transactions.

Example

This example illustrates the use of the Streaming feature in Anypoint DataMapper.

An HTTP Connector receives a CSV file, then passes it to DataMapper. DataMapper maps the input data from CSV to POJO. A Database Connector inserts the data into an external database. In this scenario, DataMapper and the Database connector work in parallel as a pipeline, further improving application performance.

The image below displays the DataMapper view as configured for this example.

Finally, the output connector receives the list of maps, then incorporates each item as a value in the SQL query for the external database.

INSERT INTO Persons (name, city, email, phone) VALUES (#[payload.name], #[payload.city], #[payload.email], #[payload.phone])