Network data stream simulation with time range LDA pattern mining -- 2

Closed Posted 2 years ago Paid on delivery
Closed Paid on delivery

This project involves the simulation of a SIEM system using Latent Dirichlet Allocation for IoT device streams. It can be implemented in R, Python, C++ or any relevant language that achieves the outcome.

Workflow

Input config > random & pattern generated content streams > stream chunks > LDA parser > output pattern frequency & topics per stream

Data Generation

Input config > random & pattern generated content streams

The generator should be configurable and able to create network simulation data streams. Each stream generates random content and includes generated content as provided by the config file:

1. stream information

2. string and regex patterns to include in the stream (generator fills the regex with matching values)

3. occurrence frequency (range 0 to 10) which represents the number of the generated string and regex patterns to include per minute. Does not have to be very sophisticated, just relatively different.

The generator can be started and stopped.

Example inputs configuration for 2 streams in JSON format.

/ input/[login to view URL]

{

{

“name”: “endpoint1”,

“ip”: [login to view URL],

“port”: 345,

{

“pattern”: “IP_EXT: '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}' MSG: ^#[^ !@#$%^&*(),.?":{}|<>]*$ USER: ^[a-z0-9_-]{3,15}$”

“frequency”: 2

},

{

“pattern”: “PAYLOAD: ^ABC_[^ !@#$%^&*(),.?":{}|<>]*$ ID: ^[a-z0-9_-]{30,150}$”

“frequency”: 5

},

},

{ “name”: “syslog1”,

“ip”: [login to view URL],

“port”: 534,

{

“pattern”: “IP_EXT: '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}' MSG: ^#[^ !@#$%^&*(),.?":{}|<>]*$ USER: ^[a-z0-9_-]{3,15}$”

“frequency”: 2

},

{

“pattern”: “PAYLOAD: ^ABC_[^ !@#$%^&*(),.?":{}|<>]*$ ID: ^[a-z0-9_-]{30,150}$”

“frequency”: 5

},

},

}

Sample stream chunk.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed euismod eros a lectus porttitor, vitae aliquet magna ullamcorper. Praesent in enim non magna vehicula faucibus. Vestibulum lacinia velit ut dolor aliquet tincidunt. IP_EXT: [login to view URL] MSG: #abyx USER: das-dkjh Ut consectetur hendrerit massa vel tempus. Nulla sit amet libero id felis lacinia accumsan. PAYLOAD: ABC_aS57dasd USR: 42d8ffe6-8a65-416c-ac92-d5826315faa6 In dictum porta magna sed lectus venenatis. Aliquam accumsan molestie augue, sit lectus amet vulputate metus tristique et. Ut a lectus erat elit….

Regex specifications from

[login to view URL]

[login to view URL]

[login to view URL]

Stream Parser

stream chunks > LDA parser > output pattern frequency & topics per stream

The streams are red by a parser application which reads each input stream for a configurable span of time (e.g. 30 seconds) as input chunks. You must use the Latent Dirichlet Allocation package or method to analyze the data and create/append to 3 log files per stream. Each run is in a new output folder with a timestamp from when the run began.

1. the found matching patterns log (use the input file to identify patterns),

2. the count of the patterns in that timespan log, and

3. up to 10 highest frequency single string terms (LDA topics, occurrence > 1 & not in regex patterns?)

Example

/ output

/ [login to view URL]

/ run_timestamp1

/ run_timestamp2

/ run_ 1624313100 # start time of log run

/ endpoint1,

/[login to view URL]

/ [login to view URL]

/ [login to view URL]

/ syslog1

/[login to view URL]

/ [login to view URL]

/ [login to view URL]

{

“chunk_timespan_seconds”: 30

}

endpoint1/[login to view URL] (any matched pattern with timestamp)

1624313104: IP_EXT: [login to view URL] MSG: #abyx USER: das-dkjh

1624313117: PAYLOAD: ABC_aS57dasd ID: 42d8ffe6-8a65-416c-ac92-d5826315faa6

1624313125: IP_EXT: [login to view URL] MSG: #dasdadafg USER: afs-adsfsfsdd

1624313150: IP_EXT: [login to view URL] MSG: #dfhdfg USER: sdfff-gdfg

endpoint1/[login to view URL] (30 second interval finding summary)

Python R Programming Language C++ Programming

Project ID: #30864302

About the project

5 proposals Remote project Active 2 years ago

5 freelancers are bidding on average $294 for this job

merinsinha

Senior R , Python Expert. As 9+ years experiences in these field. I can give good quality work. I have read the guidelines of your work.I believe that i can provide you the best quality works you are anticipating from More

$250 USD in 5 days
(31 Reviews)
4.9
luguanhuang

Hello, I have reached a second-level seller in fiverr, and I can get about two thousand dollars a month . I have done Windows and linux c++ project related tasks for many students and company. I hope you can consider More

$60 USD in 5 days
(4 Reviews)
3.3
SVjxColh

Hi. I did a very similar project for another client around 5-6 months ago. I am sure i can do the same for you. Kindly drop me a message in chat so we can discuss this in more detail

$60 USD in 8 days
(0 Reviews)
0.0
denystamene

Hi. As I am a C++, Python and R expert, I can surely complete your project in a short time. Your task really suits me right. Let me know more about it. If you hire me, I'll be sure to put my full time to your requireme More

$1000 USD in 15 days
(0 Reviews)
0.0