Named Entity Recognition (NER) is a common application for LLMs involving extraction of structured information like names, emails, phone numbers, and addresses from unstructured text. While many larger hosted models like ChatGPT, Claude, and DeepSeek handle this task well, local LLMs can often be used for this and other practical, real-world task, in this case with the help of In-Context Learning (covered in the previous post).

LLM Workstation Setup

We’ll be using Microsoft Phi-4, quantized to 8bpw, running locally on a 16GB NVIDIA A4000 GPU. This setup provides fast output and leaves enough VRAM for cache and activations. Consumer GPUs like the 16GB NVIDIA 5070 Ti and even older models like the 24GB 3090 or 4090 are also sufficient for these tasks.

These smaller models also run fairly well on modern desktop PCs using just a CPU and RAM via llama.cpp (code-based) or LM Studio (GUI), albeit much more slowly.

Prompt

In the prompt below, we build upon techniques we’ve explored previously, including In-Context Learning (ICL), to 1) give the model precise instructions and 2) examples of the output we expect:

Objective:

Extract structured contact information from the provided unformatted text and convert it into CSV format with specified columns. Capture all available information, even if some fields are missing. Format all phone numbers as (XXX) XXX-XXXX and ensure city and state are captured separately from the address.

Columns to Extract:

- Name: Full name of the person (e.g., "John Doe")
- Email: Email address of the person (e.g., "johndoe@email.com")
- Phone: Phone number formatted as "(XXX) XXX-XXXX" (e.g., "(555) 123-4567")
- Address: Street address without city, state, or ZIP (e.g., "432 Oak St")
- City: Full city name (e.g., "Springfield")
- State: Two-letter state abbreviation (e.g., "IL")
- Zip: Postal ZIP code (e.g., "60901")

Instructions:

1. Read the Unformatted Text Carefully: Identify and extract each piece of information based on common patterns and keywords.

2. Use Patterns for Extraction:
   - Phone Numbers: Look for patterns like "(XXX) XXX-XXXX" or "XXX-XXX-XXXX" and reformat to "(XXX) XXX-XXXX".
   - Email Addresses: Identify email patterns using "example@domain.com".
   - Addresses: Extract street addresses and separate them from city, state, and ZIP code information.

3. Identify Keywords and Phrases:
   - Use common indicators like Email:, Phone:, Address:, City:, State:, and ZIP: to locate relevant data.

4. Format Data into CSV:
   - Enclose each field in double quotes.
   - Separate fields with commas.
   - Ensure no extra spaces around commas.

5. Handle Missing Data:
   - Leave fields blank if no information is available.
   - Include city and state if known, even if other fields are blank.

Error Handling:

- Flag Entries for Review: Identify entries with incomplete data for manual review.
- Correct Misplaced or Incorrect Data: Adjust any misplaced data or incorrect formatting according to the specified rules.

Example:

Given the unformatted text:

John Doe - johndoe@email.com
(555) 123-4567
432 Oak St, Springfield, IL

The extracted CSV should be:

"Name","Email","Phone","Address","City","State","Zip"
"John Doe","johndoe@email.com","(555) 123-4567","432 Oak St","Springfield","IL",""

Data to process:

[unstructured data]

Sample Unstructured Data

Now that we have our prompt, we need data. Below are fictional names and contact information as an example of completely freeform (unstructured) data, and it’s a mess — not all fields are consistently used or even present:

John Doe - johndoe@email.com
(555) 123-4567
432 Oak St, Springfield, IL
Susan Taylor
susan_taylor91@gmail.com
(212) 555-7890
912 Westbrook Ave, Denver, CO 80203
Michael Smith - 415.999.2345
37 Maple Dr, SF, CA 94115
Rebecca Johnson
rebeccaj@workmail.com
8772226789
1180 Pine Road, Boston, MA
Kevin Miller
404-555-9988
726 Highland Blvd, Atlanta, GA 30306
Julia Martinez - julia.m@example.org
(305) 4443211
520 Palm Avenue, Miami, FL 33139
Samuel Nguyen
800-333-1212
293 Sunset Blvd, Los Angeles, CA 90028
Emily Brown - ebrown42@service.net
503.777.6543
661 Cedar Ln, Portland, OR 97205
Patrick O'Connor
612 555 7654
84 Ridgeway Street, Minneapolis, MN 55403
Linda Wilson - linda_wilson@eduplace.edu
(919) 888-9876
920 Elm Street, Raleigh, NC 27601
James Anderson
7735554321
301 Birch Avenue, Chicago, IL 60614
Olivia Harris - olivia_harris@nonprofit.org
(415) 666-7890
105 Mission Street, San Francisco, CA 94103
Charles Wright
718-222-5432
785 Grand Avenue, Brooklyn, NY 11238
Lauren Mitchell - lauren.mitchell@healthcare.org
(202) 555-2121
3901 Willow Road, Washington, DC 20007
Brandon Torres
2815557890
675 Horizon St, Houston, TX 77002
Matthew Lewis - matthew.lewis@lawfirm.com
213-888-5678
1408 Broadway, Los Angeles, CA 90015
Deborah Clark
(617) 4448901
88 Research Way, Cambridge, MA 02139
Richard Scott - richard.scott@startup.io
415 777-9999
219 Innovation Drive, Palo Alto, CA 94306
Natalie Moore
natalie.moore@fashionhouse.com
500 Madison Ave, New York, NY 10022
Daniel Watson - daniel.watson@architecturefirm.com
602-555-4444
351 Desert Road, Phoenix, AZ 85004
Chloe Adams
(312) 666-7777
999 Lake Shore Drive, Chicago, IL 60611
Aaron Bennett - aaron.bennett@musiclabel.com
615-555-1234
200 Nashville Ave, Nashville, TN 37201
Tiffany Roberts
tiffany.roberts@realestate.net
Charlotte, NC
Robert Young - robert.young@logistics.com
(303) 999-8765
810 Ridge Crest, Denver, CO 80211
Hannah Green
818-444-9999
120 Sunset Plaza, Hollywood, CA 90069
Steven Hall - steven.hall@bankingcorp.com
(702) 555-6666
789 Fremont Street, Las Vegas, NV 89101
Katherine Evans
(215) 888-2222
55 Chestnut Avenue, Philadelphia, PA 19103
David Hernandez - david.hernandez@automotive.com
313-555-5555
1100 Motorway Blvd, Detroit, MI 48226
Samantha Lee
samantha.lee@hotelchain.com
(808) 777-3456
Honolulu, HI 96813
Keith King - keith.king@university.edu
(801) 222-8888
965 Mountain Road, Salt Lake City, UT 84101
Galena Carter
gelena.carter@cateringsolutions.com
(504) 555-1234
389 Bayou Lane, New Orleans, LA 70116
Timothy Cook - timothy.cook@constructionco.com
(405) 888-4444
333 Brick Road, Oklahoma City, OK 73102
Donna Edwards
donna.edwards@insurance.net
(317) 666-5555
470 Monument Circle, Indianapolis, IN 46204
Lucas Ward - lucas.ward@governmentagency.gov
(907) 777-9999
1200 Alaska Way, Anchorage, AK 99501
Emma Collins
emma.collins@techstartup.com
Austin, TX 78701
Nicholas Morris - nicholas.morris@lawyersfirm.com
(859) 222-1212
267 Bluegrass Lane, Lexington, KY 40507
Jacob Thomas
jacob.thomas@telecomcorp.com
(919) 444-8888
780 Signal Road, Durham, NC 27701
Rachel Anderson - rachel.anderson@hospitalnetwork.org
(414) 555-4321
620 Milwaukee Ave, Milwaukee, WI 53202
Logan Walker
907-888-6666
100 Glacier Street, Juneau, AK 99801
Grayson Martin - grayson.martin@energygroup.com
(406) 555-8765
289 Power Road, Billings, MT 59101
Audrey Scott
audrey.scott@university.edu
810 Campus Drive, Iowa City, IA 52242
Elijah Rodriguez - elijah.rodriguez@agriculture.usda.gov
(701) 555-3333
205 Prairie Lane, Fargo, ND 58102
Danielle Parker
danielle.parker@librarysystem.org
(605) 444-1212
905 Capitol Blvd, Pierre, SD 57501
Connor Hughes - connor.hughes@researchinstitute.edu
(208) 888-7777
340 Research Circle, Boise, ID 83702
Victoria Reed
victoria.reed@transportauthority.gov
(307) 555-8888
410 Frontier Avenue, Cheyenne, WY 82001

First Pass

Let’s see what happens when we pass the unstructured contact data with our prompt to create a structured CSV-style result that we can easily import into a database, worksheet, or CRM application:

"Name","Email","Phone","Address","City","State","Zip"
"John Doe","johndoe@email.com","(555) 123-4567","432 Oak St","Springfield","IL",""
"Susan Taylor","susan_taylor91@gmail.com","(212) 555-7890","912 Westbrook Ave","Denver","CO","80203"
"Michael Smith","","(415) 999-2345","37 Maple Dr","SF","CA","94115"
"Rebecca Johnson","rebeccaj@workmail.com","(877) 222-6789","1180 Pine Road","Boston","MA",""
"Kevin Miller","","(404) 555-9988","726 Highland Blvd","Atlanta","GA","30306"
"Julia Martinez","julia.m@example.org","(305) 444-3211","520 Palm Avenue","Miami","FL","33139"
"Samuel Nguyen","","(800) 333-1212","293 Sunset Blvd","Los Angeles","CA","90028"
"Emily Brown","ebrown42@service.net","(503) 777-6543","661 Cedar Ln","Portland","OR","97205"
"Patrick O'Connor","","(612) 555-7654","84 Ridgeway Street","Minneapolis","MN","55403"
"Linda Wilson","linda_wilson@eduplace.edu","(919) 888-9876","920 Elm Street","Raleigh","NC","27601"
"James Anderson","","(773) 555-4321","301 Birch Avenue","Chicago","IL","60614"
"Olivia Harris","olivia_harris@nonprofit.org","(415) 666-7890","105 Mission Street","San Francisco","CA","94103"
"Charles Wright","","(718) 222-5432","785 Grand Avenue","Brooklyn","NY","11238"
"Lauren Mitchell","lauren.mitchell@healthcare.org","(202) 555-2121","3901 Willow Road","Washington","DC","20007"
"Brandon Torres","","(281) 555-7890","675 Horizon St","Houston","TX","77002"
"Matthew Lewis","matthew.lewis@lawfirm.com","(213) 888-5678","1408 Broadway","Los Angeles","CA","90015"
"Deborah Clark","","(617) 444-8901","88 Research Way","Cambridge","MA","02139"
"Richard Scott","richard.scott@startup.io","(415) 777-9999","219 Innovation Drive","Palo Alto","CA","94306"
"Natalie Moore","natalie.moore@fashionhouse.com","","500 Madison Ave","New York","NY","10022"
"Daniel Watson","daniel.watson@architecturefirm.com","(602) 555-4444","351 Desert Road","Phoenix","AZ","85004"
"Chloe Adams","","(312) 666-7777","999 Lake Shore Drive","Chicago","IL","60611"
"Aaron Bennett","aaron.bennett@musiclabel.com","(615) 555-1234","200 Nashville Ave","Nashville","TN","37201"
"Tiffany Roberts","tiffany.roberts@realestate.net","","Charlotte","NC",""
"Robert Young","robert.young@logistics.com","(303) 999-8765","810 Ridge Crest","Denver","CO","80211"
"Hannah Green","","(818) 444-9999","120 Sunset Plaza","Hollywood","CA","90069"
"Steven Hall","steven.hall@bankingcorp.com","(702) 555-6666","789 Fremont Street","Las Vegas","NV","89101"
"Katherine Evans","","(215) 888-2222","55 Chestnut Avenue","Philadelphia","PA","19103"
"David Hernandez","david.hernandez@automotive.com","(313) 555-5555","1100 Motorway Blvd","Detroit","MI","48226"
"Samantha Lee","samantha.lee@hotelchain.com","","Honolulu","HI","96813"
"Keith King","keith.king@university.edu","(801) 222-8888","965 Mountain Road","Salt Lake City","UT","84101"
"Galena Carter","gelena.carter@cateringsolutions.com","(504) 555-1234","389 Bayou Lane","New Orleans","LA","70116"
"Timothy Cook","timothy.cook@constructionco.com","(405) 888-4444","333 Brick Road","Oklahoma City","OK","73102"
"Donna Edwards","donna.edwards@insurance.net","(317) 666-5555","470 Monument Circle","Indianapolis","IN","46204"
"Lucas Ward","lucas.ward@governmentagency.gov","(907) 777-9999","1200 Alaska Way","Anchorage","AK","99501"
"Emma Collins","","","Austin","TX","78701"
"Nicholas Morris","nicholas.morris@lawyersfirm.com","(859) 222-1212","267 Bluegrass Lane","Lexington","KY","40507"
"Jacob Thomas","jacob.thomas@telecomcorp.com","(919) 444-8888","780 Signal Road","Durham","NC","27701"
"Rachel Anderson","rachel.anderson@hospitalnetwork.org","(414) 555-4321","620 Milwaukee Ave","Milwaukee","WI","53202"
"Logan Walker","","(907) 888-6666","100 Glacier Street","Juneau","AK","99801"
"Grayson Martin","grayson.martin@energygroup.com","(406) 555-8765","289 Power Road","Billings","MT","59101"
"Audrey Scott","audrey.scott@university.edu","","810 Campus Drive","Iowa City","IA","52242"
"Elijah Rodriguez","elijah.rodriguez@agriculture.usda.gov","(701) 555-3333","205 Prairie Lane","Fargo","ND","58102"
"Danielle Parker","danielle.parker@librarysystem.org","(605) 444-1212","905 Capitol Blvd","Pierre","SD","57501"
"Connor Hughes","connor.hughes@researchinstitute.edu","(208) 888-7777","340 Research Circle","Boise","ID","83702"
"Victoria Reed","victoria.reed@transportauthority.gov","(307) 555-8888","410 Frontier Avenue","Cheyenne","WY","82001"

Looks pretty good, but we’re not done yet. Let’s use a more powerful LLM (ChatGPT 4o) just to validate our model and prompt effectiveness:

I'll compare the final CSV output with the source data to identify any errors or omissions (i.e., missing or incorrectly formatted data).

Issues Noticed:
City/State Missing for Some Entries
Example:
"Emma Collins","","","Austin","TX","78701" → In the source, "Austin, TX" was not in structured form, but it was inferred. This is acceptable.
"Tiffany Roberts","tiffany.roberts@realestate.net","","Charlotte","NC","" → No zip code in source, so this is fine.
Address Formatting Issues
"Michael Smith","","(415) 999-2345","37 Maple Dr","SF","CA","94115"
Error: "SF" should be "San Francisco" to align with standard full city names.
Email, Phone, or Address Missing Where Present in Source
No missing entries detected. The model correctly captured emails, phone numbers, and addresses where available.
Incorrect or Inconsistent Abbreviations
"San Francisco" was reduced to "SF", but all other cities remain full names. This should be standardized.
Verdict:
The output is very accurate, with only one main issue (SF instead of San Francisco). If standardization is necessary, ensure city names are always fully written out. Other than that, the formatting, structure, and missing values handling appear correct.

This is to be expected; LLMs generate output in one pass and one token at a time, and therefore can’t go back to validate in a single pass. However, we can perform a second pass to tighten things up.

Second Pass

A quick second pass with a simple refining prompt can often catch and either report or correct most errors. No tricks — just instruct the model to fix formatting or re-check for missing values. This generally results in very accurate output.

Objective:

Perform a second-pass validation of structured contact data extracted from unstructured text. Compare the original source data with the first-pass CSV output to detect and correct:

Missing information that exists in the source but wasn’t extracted
Formatting inconsistencies (especially in phone numbers and addresses)
Incorrect parsing (e.g., names combined with emails, ZIP codes in wrong fields)
Incomplete or misclassified fields (e.g., city/state misidentified as address)
Duplicates or skipped entries
Instructions:

Compare Entries Side by Side:
Review the original unstructured text alongside the structured data.
Identify discrepancies in names, emails, phone numbers, addresses, cities, states, and ZIP codes.
Check for Formatting Issues:
Ensure all phone numbers follow the "(XXX) XXX-XXXX" format.
Ensure addresses do not include city/state/ZIP.
Validate emails match correct syntax (name@domain.com).
Detect Omissions:
Identify any missing contact entries present in the source but absent in the structured data.
Flag and list rows where data was skipped or fields were left blank despite being available.
Correct Misplaced Data:
Realign any data that's in the wrong field (e.g., ZIP in address, phone in name).
Split or reassign fields as needed to match the correct format.
Output Revised CSV:
Provide a corrected version of the CSV with all updates.
Clearly mark rows that were corrected, added, or flagged for manual review (if applicable).
Summary Report:
List the number of corrections made by category (e.g., phone formatting, missing emails).
Flag entries that still need manual attention.
Inputs:

Source Data: [unformatted text]
First-Pass Output: [CSV or structured table]

Now, let’s compare again:

Here are the phone number formatting discrepancies found between the raw processed data and the provided formatted CSV:

Name	Raw Format	Formatted CSV
Rebecca Johnson	8772226789	(877) 222-6789
Kevin Miller	404-555-9988	(404) 555-9988
Julia Martinez	(305) 4443211	(305) 444-3211
Samuel Nguyen	800-333-1212	(800) 333-1212
Emily Brown	503.777.6543	(503) 777-6543
These are purely formatting issues—no missing or incorrect data was found otherwise.

Even after the second pass, we do have some minor issues, which is why human review is still essential for verifying results, especially before importing structured data into other systems.

Conclusion

Named Entity Recognition is an excellent, productive use case for LLMs, even possible with smaller on-premises models to reduce cost and address data privacy and PII concerns. While not perfect, combining careful prompting with realistic expectations and human review makes it a handy tool to have in your LLM toolbox.

Leave a comment

Dave Ziegler

I’m a full-stack AI/LLM practitioner and solutions architect with 30+ years enterprise IT, application development, consulting, and technical communication experience.

While I currently engage in LLM consulting, application development, integration, local deployments, and technical training, my focus is on AI safety, ethics, education, and industry transparency.

Open to opportunities in technical education, system design consultation, practical deployment guidance, model evaluation, red teaming/adversarial prompting, and technical communication.

My passion is bridging the gap between theory and practice by making complex systems comprehensible and actionable.

Founding Member, AI Mental Health Collective

Community Moderator / SME, The Human Line Project

Let’s connect

Discord: AightBits